# Revisiting Arc Midpoints in Complex Numbers

## 1. Synopsis

One of the major headaches of using complex numbers in olympiad geometry problems is dealing with square roots. In particular, it is nontrivial to express the incenter of a triangle inscribed in the unit circle in terms of its vertices.

The following lemma is the standard way to set up the arc midpoints of a triangle. It appears for example as part (a) of Lemma 6.23.

Theorem 1 (Arc midpoint setup for a triangle)

Let ${ABC}$ be a triangle with circumcircle ${\Gamma}$ and let ${M_A}$, ${M_B}$, ${M_C}$ denote the arc midpoints of ${\widehat{BC}}$ opposite ${A}$, ${\widehat{CA}}$ opposite ${B}$, ${\widehat{AB}}$ opposite ${C}$.

Suppose we view ${\Gamma}$ as the unit circle in the complex plane. Then there exist complex numbers ${x}$, ${y}$, ${z}$ such that ${A = x^2}$, ${B = y^2}$, ${C = z^2}$, and

$\displaystyle M_A = -yz, \quad M_B = -zx, \quad M_C = -xy.$

Theorem 1 is often used in combination with the following lemma, which lets one assign the incenter the coordinates ${-(xy+yz+zx)}$ in the above notation.

Lemma 2 (The incenter is the orthocenter of opposite arc midpoints)

Let ${ABC}$ be a triangle with circumcircle ${\Gamma}$ and let ${M_A}$, ${M_B}$, ${M_C}$ denote the arc midpoints of ${\widehat{BC}}$ opposite ${A}$, ${\widehat{CA}}$ opposite ${B}$, ${\widehat{AB}}$ opposite ${C}$. Then the incenter of ${\triangle ABC}$ coincides with the orthocenter of ${\triangle M_A M_B M_C}$.

Unfortunately, the proof of Theorem 1 in my textbook is wrong, and cannot find a proof online (though I hear that Lemmas in Olympiad Geometry has a proof). So in this post I will give a correct proof of Theorem 1, which will hopefully also explain the mysterious introduction of the minus signs in the theorem statement. In addition I will give a version of the theorem valid for quadrilaterals.

## 2. A Word of Warning

I should at once warn the reader that Theorem 1 is an existence result, and thus must be applied carefully.

To see why this matters, consider the following problem, which appeared as problem 1 of the 2016 JMO.

Example 3 (JMO 2016, by Zuming feng)

The isosceles triangle ${\triangle ABC}$, with ${AB=AC}$, is inscribed in the circle ${\omega}$. Let ${P}$ be a variable point on the arc ${BC}$ that does not contain ${A}$, and let ${I_B}$ and ${I_C}$ denote the incenters of triangles ${\triangle ABP}$ and ${\triangle ACP}$, respectively. Prove that as ${P}$ varies, the circumcircle of triangle ${\triangle PI_{B}I_{C}}$ passes through a fixed point.

By experimenting with the diagram, it is not hard to guess that the correct fixed point is the midpoint of arc ${\widehat{BC}}$, as seen in the figure below. One might be tempted to write ${A = x^2}$, ${B = y^2}$, ${C = z^2}$, ${P = t^2}$ and assert the two incenters are ${-(xy+yt+xt)}$ and ${-(xz+zt+xt)}$, and that the fixed point is ${-yz}$.

This is a mistake! If one applies Theorem 1 twice, then the choices of “square roots” of the common vertices ${A}$ and ${P}$ may not be compatible. In fact, they cannot be compatible, because the arc midpoint of ${\widehat{AP}}$ opposite ${B}$ is different from the arc midpoint of ${\widehat{AP}}$ opposite ${C}$.

In fact, I claim this is not a minor issue that one can work around. This is because the claim that the circumcircle of ${\triangle P I_B I_C}$ passes through the midpoint of arc ${\widehat{BC}}$ is false if ${P}$ lies on the arc on the same side as ${A}$! In that case it actually passes through ${A}$ instead. Thus the truth of the problem really depends on the fact that the quadrilateral ${ABPC}$ is convex, and any attempt with complex numbers must take this into account to have a chance of working.

## 3. Proof of the theorem for triangles

Fix ${ABC}$ now, so we require ${A = x^2}$, ${B = y^2}$, ${C = z^2}$. There are ${2^3 = 8}$ choices of square roots ${x}$, ${y}$, ${z}$ we can take (differing by a sign); we wish to show one of them works.

We pick an arbitrary choice for ${x}$ first. Then, of the two choices of ${y}$, we pick the one such that ${-xy = M_C}$. Similarly, for the two choices of ${z}$, we pick the one such that ${-xz = M_B}$. Our goal is to show that under these conditions, we have ${M_A = -yz}$ again.

The main trick is to now consider the arc midpoint ${\widehat{BAC}}$, which we denote by ${L}$. It is easy to see that:

Lemma 4 (The isosceles trapezoid trick)

We have ${\overline{AL} \parallel \overline{M_B M_C}}$ (both are perpendicular to the ${\angle A}$ bisector). Thus ${A L M_B M_C}$ is an isosceles trapezoid, and so ${ A \cdot L = M_B \cdot M_C }$.

Thus, we have

$\displaystyle L = \frac{M_B M_C}{A} = \frac{(-xz)(-xy)}{x^2} = +yz.$

Thus

$\displaystyle M_A = -L = -yz$

as desired.

From this we can see why the minus signs are necessary.

Exercise 5

Show that Theorem 1 becomes false if we try to use ${+yz}$, ${+zx}$, ${+xy}$ instead of ${-yz}$, ${-zx}$, ${-xy}$.

## 4. A version for quadrilaterals

We now return to the setting of a convex quadrilateral ${ABPC}$ that we encountered in Example 3. Suppose we preserve the variables ${x}$, ${y}$, ${z}$ that we were given from Theorem 1, but now add a fourth complex number ${t}$ with ${P = t^2}$. How are the new arc midpoints determined? The following theorem answers this question.

Theorem 6 (${xytz}$ setup)

Let ${ABPC}$ be a convex quadrilateral inscribed in the unit circle of the complex plane. Then we can choose complex numbers ${x}$, ${y}$, ${z}$, ${t}$ such that ${A = x^2}$, ${B = y^2}$, ${C = z^2}$, ${P = t^2}$ and:

• The opposite arc midpoints ${M_A}$, ${M_B}$, ${M_C}$ of triangle ${ABC}$ are given by ${-yz}$, ${-zx}$, ${-xy}$, as before.
• The midpoint of arc ${\widehat{BP}}$ not including ${A}$ or ${C}$ is given by ${+yt}$.
• The midpoint of arc ${\widehat{CP}}$ not including ${A}$ or ${B}$ is given by ${-zt}$.
• The midpoint of arc ${\widehat{ABP}}$ is ${+xt}$ and the midpoint of arc ${\widehat{ACP}}$ is ${-xt}$.

This setup is summarized in the following figure.

Note that unlike Theorem 1, the four arcs cut out by the sides of ${ABCP}$ do not all have the same sign (I chose ${\widehat{BP}}$ to have coordinates ${+yt}$). This asymmetry is inevitable (see if you can understand why from the proof below).

Proof: We select ${x}$, ${y}$, ${z}$ with Theorem 1. Now, pick a choice of ${t}$ such that ${+yt}$ is the arc midpoint of ${\widehat{BP}}$ not containing ${A}$ and ${C}$. Then the arc midpoint of ${\widehat{CP}}$ not containing ${A}$ or ${B}$ is given by

$\displaystyle \frac{z^2}{-yz} \cdot (+yt) = -zt.$

On the other hand, the calculation of ${-xt}$ for the midpoint of ${\widehat{ABP}}$ follows by applying Lemma 4 again. (applied to triangle ${ABP}$). The midpoint of ${\widehat{ACP}}$ is computed similarly. $\Box$

In other problems, the four vertices of the quadrilateral may play more symmetric roles and in that case it may be desirable to pick a setup in which the four vertices are labeled ${ABCD}$ in order. By relabeling the letters in Theorem 6 one can prove the following alternate formulation.

Corollary 7

Let ${ABCD}$ be a convex quadrilateral inscribed in the unit circle of the complex plane. Then we can choose complex numbers ${a}$, ${b}$, ${c}$, ${d}$ such that ${A = a^2}$, ${B = b^2}$, ${C = c^2}$, ${D = d^2}$ and:

• The midpoints of ${\widehat{AB}}$, ${\widehat{BC}}$, ${\widehat{CD}}$, ${\widehat{DA}}$ cut out by the sides of ${ABCD}$ are ${-ab}$, ${-bc}$, ${-cd}$, ${+da}$.
• The midpoints of ${\widehat{ABC}}$ and ${\widehat{BCD}}$ are ${+ac}$ and ${+bd}$.
• The midpoints of ${\widehat{CDA}}$ and ${\widehat{DAB}}$ are ${-ac}$ and ${-bd}$.

To test the newfound theorem, here is a cute easy application.

Example 8 (Japanese theorem for cyclic quadrilaterals)

In a cyclic quadrilateral ${ABCD}$, the incenters of ${\triangle ABC}$, ${\triangle BCD}$, ${\triangle CDA}$, ${\triangle DAB}$ are the vertices of a rectangle.

In a previous post I tried to make the point that math olympiads should not be judged by their relevance to research mathematics. In doing so I failed to actually explain why I think math olympiads are a valuable experience for high schoolers, so I want to make amends here.

## 1. Summary

In high school I used to think that math contests were primarily meant to encourage contestants to study some math that is (much) more interesting than what’s typically shown in high school. While I still think this is one goal, and maybe it still is the primary goal in some people’s minds, I no longer believe this is the primary benefit.

My current belief is that there are two major benefits from math competitions:

1. To build a social network for gifted high school students with similar interests.
2. To provide a challenging experience that lets gifted students grow and develop intellectually.

I should at once disclaim that I do not claim these are the only purpose of mathematical olympiads. Indeed, mathematics is a beautiful subject and introducing competitors to this field of study is of course a great thing (in particular it was life-changing for me). But as I have said before, many alumni of math olympiads do not eventually become mathematicians, and so in my mind I would like to make the case that these alumni have gained a lot from the experience anyways.

## 2. Social experience

Now that we have email, Facebook, Art of Problem Solving, and whatnot, the math contest community is much larger and stronger than it’s ever been in the past. For the first time, it’s really possible to stay connected with other competitors throughout the entire year, rather than just seeing each other a handful of times during contest season. There’s literally group chats of contestants all over the country where people talk about math problems or the solar eclipse or share funny pictures or inside jokes or everything else. In many ways, being part of the high school math contest community is a lot like having access to the peer group at a top-tier university, except four years earlier.

There’s some concern that a competitive culture is unhealthy for the contestants. I want to make a brief defense here.

I really do think that the contest community is good at being collaborative rather than competitive. You can imagine a world where the competitors think about contests in terms of trying to get a better score than the other person. [1] That would not be a good world. But I think by and large the community is good at thinking about it as just trying to maximize their own score. The score of the person next to you isn’t supposed to matter (and thinking about it doesn’t help, anyways).

Put more bluntly, on contest day, you have one job: get full marks. [2]

Because we have a culture of this shape, we now get a group of talented students all working towards the same thing, rather than against one another. That’s what makes it possible to have a self-supportive community, and what makes it possible for the contestants to really become friends with each other.

I think the strongest contestants don’t even care about the results of contests other than the few really important ones (like USAMO/IMO). It is a long-running joke that the Harvard-MIT Math Tournament is secretly just a MOP reunion, and I personally see to it that this happens every year. [3]

I’ve also heard similar sentiments about ARML:

I enjoy ARML primarily based on the social part of the contest, and many people agree with me; the highlight of ARML for some people is the long bus ride to the contest. Indeed, I think of ARML primarily as a social event, with some mathematics to make it look like the participants are actually doing something important.

(Don’t tell the parents.)

## 3. Intellectual growth

My view is that if you spend a lot of time thinking or working about anything deep, then you will learn and grow from the experience, almost regardless of what that thing is at an object level. Take chess as an example — even though chess definitely has even fewer “real-life applications” than math, if you take anyone with a 2000+ rating I don’t think many of them would think that the time they invested into the game was wasted. [4]

Olympiad mathematics seems to be no exception to this. In fact the sheer depth and difficulty of the subject probably makes it a particularly good example. [5]

I’m now going to fill this section with a bunch of examples although I don’t claim the list is exhaustive. First, here are the ones that everyone talks about and more or less agrees on:

• Learning how to think, because, well, that’s how you solve a contest problem.
• Learning to work hard and not give up, because the contest is difficult and you will not win by accident; you need to actually go through a lot of training.
• Dual to above, learning to give up on a problem, because sometime the problem really is too hard for you and you won’t solve it even if you spend another ten or twenty or fifty hours, and you have to learn to cut your losses. There is a balancing act here that I think really is best taught by experience, rather than the standard high-school moral cheerleading where you are supposed to “never give up” or something.
• But also learning to be humble or to ask for help, which is a really hard thing for a lot of young contestants to do.
• Learning to be patient, not only with solving problems but with the entire journey. You usually do not improve dramatically overnight.

Here are some others I also believe, but don’t hear as often.

• Learning to be independent, because odds are your high-school math teacher won’t be able to help you with USAMO problems. Training for the highest level of contests is these days almost always done more or less independently. I think having the self-motivation to do the training yourself, as well as the capacity to essentially have to design your own training (making judgments on what to work on, et cetera) is itself a valuable cross-domain skill. (I’m a little sad sometimes that by teaching I deprive my students of the opportunity to practice this. It is a cost.)
• Being able to work neatly, not because your parents told you to but because if you are sloppy then it will cost you points when you make small (or large) errors on IMO #1. Olympiad problems are difficult enough as is, and you do not want to let them become any harder because of your own sloppiness. (And there are definitely examples of olympiad problems which are impossible to solve if you are not organized.)
• Being able to organize and write your thoughts well, because some olympiad problems are complex and requires putting together more than one lemma or idea together to solve. For this to work, you need to have the skill of putting together a lot of moving parts into a single coherent argument. Bonus points here if your audience is someone you care about (as opposed to a grader), because then you have to also worry about making the presentation as clean and natural as possible.

These days, whenever I solve a problem I always take the time to write it up cleanly, because in the process of doing so I nearly always find ways that the solution can be made shorter or more elegant, or at least philosophically more natural. (I also often find my solution is wrong.) So it seems that the write-up process here is not merely about presenting the same math in different ways: the underlying math really does change. [6]

• Thinking about how to learn. For example, the Art of Problem Solving forums are often filled with questions of the form “what should I do?”. Many older users find these questions obnoxious, but I find them desirable. I think being able to spend time pondering about what makes people improve or learn well is a good trait to develop, rather than mindlessly doing one book after another.

Of course, many of the questions I referred to are poor, either with no real specific direction: often the questions are essentially “what book should I read?”, or “give me a exhaustive list of everything I should know”. But I think this is inevitable because these are people’s first attempts at understanding contest training. Just like the first difficult math contest you take often goes quite badly, the first time you try to think about learning, you will probably ask questions you will be embarrassed about in five years. My hope is that as these younger users get older and wiser, the questions and thoughts become mature as well. To this end I do not mind seeing people wobble on their first steps.

• Being honest with your own understanding, particularly of fundamentals. When watching experienced contestants, you often see people solving problems using advanced techniques like Brianchon’s theorem or the n-1 equal value principle or whatever. It’s tempting to think that if you learn the names and statements of all these advanced techniques then you’ll be able to apply them too. But the reality is that these techniques are advanced for a reason: they are hard to use without mastery of fundamentals.

This is something I definitely struggled with as a contestant: being forced to patiently learn all the fundamentals and not worry about the fancy stuff. To give an example, the 2011 JMO featured an inequality which was routine for experienced or well-trained contestants, but “almost impossible for people who either have not seen inequalities at all or just like to compile famous names in their proofs”. I was in the latter category, and tried to make up a solution using multivariable Jensen, whatever that meant. Only when I was older did I really understand what I was missing.

• Dual to the above, once you begin to master something completely you start to learn what different depths of understanding feel like, and an appreciation for just how much effort goes into developing a mastery of something.
• Being able to think about things which are not well-defined. This one often comes as a surprise to people, since math is a field which is known for its precision. But I still maintain that this a skill contests train for.

A very simple example is a question like, “when should I use the probabilistic method?”. Yes, we know it’s good for existence questions, but can we say anything more about when we expect it to work? Well, one heuristic (not the only one) is “if a monkey could find it” — the idea that a randomly selected object “should” work. But obviously something like this can’t be subject to a (useful) formal definition that works 100% of the time, and there are plenty of contexts in which even informally this heuristic gives the wrong answer. So that’s an example of a vague and nebulous concept that’s nonetheless necessary in order to understanding the probabilistic method well.

There are much more general examples one can say. What does it mean for a problem to “feel projective”? I can’t tell you a hard set of rules; you’ll have to do a bunch of examples and gain the intuition yourself. Why do I say this problem is “rigid”? Same answer. How do you tell which parts of this problem are natural, and which are artificial? How do you react if you have the feeling the problem gives you nothing to work with? How can you tell if you are making progress on a problem? Trying to figure out partial answers to these questions, even if they can’t be put in words, will go a long way in improving the mythical intuition that everyone knows is so important.

It might not be unreasonable to say that by this point we are studying philosophy, and that’s exactly what I intend. When I teach now I often make a point of referring to the “morally correct” way of thinking about things, or making a point of explaining why X should be true, rather than just providing a proof. I find this type of philosophy interesting in its own right, but that is not the main reason I incorporate it into my teaching. I teach the philosophy now because it is necessary, because you will solve fewer problems without that understanding.

## 4. I think if you don’t do well, it’s better to you

But I think the most surprising benefit of math contests is that most participants won’t win. In high school everyone tells you that if you work hard you will succeed. The USAMO is a fantastic counterexample to this. Every year, there are exactly 12 winners on the USAMO. I can promise you there are far more than 12 people who work very hard every year with the hope of doing well on the USAMO. Some people think this is discouraging, but I find it desirable.

Let me tell you a story.

Back in September of 2015, I sneaked in to the parent’s talk at Math Prize for Girls, because Zuming Feng was speaking and I wanted to hear what he had to say. (The whole talk was is available on YouTube now.) The talk had a lot of different parts that I liked, but one of them struck me in particular, when he recounted something he said to one of his top students:

I really want you to work hard, but I really think if you don’t do well, if you fail, it’s better to you.

I had a hard time relating to this when I first heard it, but it makes sense if you think about it. What I’ve tried to argue is that the benefit of math contests is not that the contestant can now solve N problems on USAMO in late April, but what you gain from the entire year of practice. And so if you hold the other 363 days fixed, and then vary only the final outcome of the USAMO, which of success and failure is going to help a contestant develop more as a person?

For that reason I really like to think that the final lesson from high school olympiads is how to appreciate the entire journey, even in spite of the eventual outcome.

### Footnotes

1. I actually think this is one of the good arguments in favor of the new JMO/USAMO system introduced in 2010. Before this, it was not uncommon for participants in 9th and 10th grade to really only aim for solving one or two entry-level USAMO problems to qualify for MOP. To this end I think the mentality of “the cutoff will probably only be X, so give up on solving problem six” is sub-optimal.
2. That’s a Zuming quote.
3. Which is why I think the HMIC is actually sort of pointless from a contestant’s perspective, but it’s good logistics training for the tournament directors.
4. I could be wrong about people thinking chess is a good experience, given that I don’t actually have any serious chess experience beyond knowing how the pieces move. A cursory scan of the Internet suggests otherwise (was surprised to find that Ben Franklin has an opinion on this) but it’s possible there are people who think chess is a waste of time, and are merely not as vocal as the people who think math contests are a waste of time.
5. Relative to what many high school students work on, not compared to research or something.
6. Privately, I think that working in math olympiads taught me way more about writing well than English class ever did; English class always felt to me like the skill of trying to sound like I was saying something substantial, even when I wasn’t.

# A story of block-ascending permutations

I recently had a combinatorics paper appear in the EJC. In this post I want to brag a bit by telling the “story” of this paper: what motivated it, how I found the conjecture that I originally did, and the process that eventually led me to the proof, and so on.

This work was part of the Duluth REU 2017, and I thank Joe Gallian for suggesting the problem.

## 1. Background

Let me begin by formulating the problem as it was given to me. First, here is the definition and notation for a “block-ascending” permutation.

Definition 1

For nonnegative integers ${a_1}$, …, ${a_n}$ an ${(a_1, \dots, a_n)}$-ascending permutation is a permutation on ${\{1, 2, \dots, a_1 + \dots + a_n\}}$ whose descent set is contained in ${\{a_1, a_1+a_2, \dots, a_1+\dots+a_{n-1}\}}$. In other words the permutation ascends in blocks of length ${a_1}$, ${a_2}$, …, ${a_n}$, and thus has the form

$\displaystyle \pi = \pi_{11} \dots \pi_{1a_1} | \pi_{21} \dots \pi_{2a_2} | \dots | \pi_{n1} \dots \pi_{na_n}$

for which ${\pi_{i1} < \pi_{i2} < \dots < \pi_{ia_i}}$ for all ${i}$.

It turns out that block-ascending permutations which also avoid an increasing subsequence of certain length have nice enumerative properties. To this end, we define the following notation.

Definition 2

Let ${\mathcal L_{k+2}(a_1, \dots, a_n)}$ denote the set of ${(a_1, \dots, a_n)}$-ascending permutations which avoid the pattern ${12 \dots (k+2)}$.

(The reason for using ${k+2}$ will be explained later.) In particular, ${\mathcal L_{k+2}(a_1 ,\dots, a_n) = \varnothing}$ if ${\max \{a_1, \dots, a_n\} \ge k+2}$.

Example 3

Here is a picture of a permutation in ${\mathcal L_7(3,2,4)}$ (but not in ${\mathcal L_6(3,2,4)}$, since one can see an increasing length ${6}$ subsequence shaded). We would denote it ${134|69|2578}$.

Now on to the results. A 2011 paper by Joel Brewster Lewis (JBL) proved (among other things) the following result:

Theorem 4 (Lewis 2011)

The sets ${\mathcal L_{k+2}(k,k,\dots,k)}$ and ${\mathcal L_{k+2}(k+1,k+1,\dots,k+1)}$ are in bijection with Young tableau of shape ${\left< (k+1)^n \right>}$.

Remark 5

When ${k=1}$, this implies ${\mathcal L_3(1,1,\dots,1)}$, which is the set of ${123}$-avoiding permutations of length ${n}$, is in bijection with the Catalan numbers; so is ${\mathcal L_3(2,\dots,2)}$ which is the set of ${123}$-avoiding zig-zag permutations.

Just before the Duluth REU in 2017, Mei and Wang proved that in fact, in Lewis’ result one may freely mix ${k}$ and ${k+1}$‘s. To simplify notation,

Definition 6

Let ${I \subseteq \left\{ 1,\dots,n \right\}}$. Then ${\mathcal L(n,k,I)}$ denotes ${\mathcal L_{k+2}(a_1,\dots,a_n)}$ where

$\displaystyle a_i = \begin{cases} k+1 & i \in I \\ k & i \notin I. \end{cases}$

Theorem 7 (Mei, Wang 2017)

The ${2^n}$ sets ${\mathcal L(n,k,I)}$ are also in bijection with Young tableau of shape ${\left< (k+1)^n \right>}$.

The proof uses the RSK correspondence, but the authors posed at the end of the paper the following open problem:

Problem

Find a direct bijection between the ${2^n}$ sets ${\mathcal L(n,k,I)}$ above, not involving the RSK correspondence.

This was the first problem that I was asked to work on. (I remember I received the problem on Sunday morning; this actually matters a bit for the narrative later.)

At this point I should pause to mention that this ${\mathcal L_{k+2}(\dots)}$ notation is my own invention, and did not exist when I originally started working on the problem. Indeed, all the results are restricted to the case where ${a_i \in \{k,k+1\}}$ for each ${i}$, and so it was unnecessary to think about other possibilities for ${a_i}$: Mei and Wang’s paper use the notation ${\mathcal L(n,k,I)}$. So while I’ll continue to use the ${\mathcal L_{k+2}(\dots)}$ notation in the blog post for readability, it will make some of the steps more obvious than they actually were.

## 2. Setting out

Mei and Wang’s paper originally suggested that rather than finding a bijection ${\mathcal L(n,k,I) \rightarrow \mathcal L(n,k,J)}$ for any ${I}$ and ${J}$, it would suffice to biject

$\displaystyle \mathcal L(n,k,I) \rightarrow \mathcal L(n,k,\varnothing)$

and then compose two such bijections. I didn’t see why this should be much easier, but it didn’t seem to hurt either.

As an example, they show how to do this bijection with ${I = \{1\}}$ and ${I = \{n\}}$. Indeed, suppose ${I = \{1\}}$. Then ${\pi_{11} < \pi_{12} < \dots < \pi_{1(k+1)}}$ is an increasing sequence of length ${k+1}$ right at the start of ${\pi}$. So ${\pi_{1(k+1)}}$ had better be the largest element in the permutation: otherwise later in ${\pi}$ the biggest element would complete an ascending permutation of length ${k+2}$ already! So removing ${\pi_{1(k+1)}}$ gives a bijection between ${\mathcal L(n,k,\{1\}) \rightarrow \mathcal L(n,k,\varnothing)}$.

But if you look carefully, this proof does essentially nothing with the later blocks. The exact same proof gives:

Proposition 8

Suppose ${1 \notin I}$. Then there is a bijection

$\displaystyle \mathcal L(n,k,I \cup \{1\}) \rightarrow \mathcal L(n,k,I)$

by deleting the ${(k+1)}$st element of the permutation (which must be largest one).

Once I found this proposition I rejected the initial suggestion of specializing ${\mathcal L(n,k,I) \rightarrow \mathcal L(n,k,\varnothing)}$. The “easy case” I had found told me that I could take a set ${I}$ and delete the single element ${1}$ from it. So empirically, my intuition from this toy example told me that it would be easier to find bijections ${\mathcal L(n,k,I) \rightarrow \mathcal L(n,k,I')}$ whee ${I'}$ and ${I}$ were only “a little different”, and hope that the resulting bijection only changed things a little bit (in the same way that in the toy example, all the bijection did was delete one element). So I shifted to trying to find small changes of this form.

## 3. The fork in the road

### 3.1. Wishful thinking

I had a lucky break of wishful thinking here. In the notation ${\mathcal L_{k+2}(a_1, \dots, a_n)}$ with ${a_i \in \{k,k+1\}}$, I had found that one could replace ${a_1}$ with either ${k}$ or ${k+1}$ freely. (But this proof relied heavily on the fact the block really being on the far left.) So what other changes might I be able to make?

There were two immediate possibilities that came to my mind.

• Deletion: We already showed ${a_1}$ could be changed from ${k+1}$ to ${k}$ for any ${i}$. If we can do a similar deletion with ${a_i}$ for any ${i}$, not just ${i=1}$, then we would be done.
• Swapping: If we can show that two adjacent ${a_i}$‘s could be swapped, that would be sufficient as well. (It’s also possible to swap non-adjacent ${a_i}$‘s, but that would cause more disruption for no extra benefit.)

Now, I had two paths that both seemed plausible to chase after. How was I supposed to know which one to pick? (Of course, it’s possible neither work, but you have to start somewhere.)

Well, maybe the correct thing to do would have to just try both. But it was Sunday afternoon by the time I got to this point. Granted, it was summer already, but I knew that come Monday I would have doctor appointments and other trivial errands to distract me, so I decided I should pick one of them and throw the rest of the day into it. But that meant I had to pick one.

(I confess that I actually already had a prior guess: the deletion approach seemed less likely to work than the swapping approach. In the deletion approach, if ${i}$ is somewhere in the middle of the permutation, it seemed like deleting an element could cause a lot of disruption. But the swapping approach preserved the total number of elements involved, and so seemed more likely that I could preserve structure. But really I was just grasping at straws.)

### 3.2. Enter C++

Yeah, I cheated. Sorry.

Those of you that know anything about my style of math know that I am an algebraist by nature — sort of. It’s more accurate to say that I depend on having concrete examples to function. True, I can’t do complexity theory for my life, but I also haven’t been able to get the hang of algebraic geometry, despite having tried to learn it three or four times by now. But enumerative combinatorics? OH LOOK EXAMPLES.

Here’s the plan: let ${k=3}$. Then using a C++ computer program:

• Enumerate all the permutations in ${S = \mathcal L_{k+2}(3,4,3,4)}$.
• Enumerate all the permutations in ${A = \mathcal L_{k+2}(3,3,3,4)}$.
• Enumerate all the permutations in ${B = \mathcal L_{k+2}(3,3,4,4)}$.

If the deletion approach is right, then I would hope ${S}$ and ${A}$ look pretty similar. On the flip side, if the swapping approach is right, then ${S}$ and ${B}$ should look close to each other instead.

It’s moments like this where my style of math really shines. I don’t have to make decisions like the above off gut-feeling: do the “data science” instead.

### 3.3. A twist of fate

Except this isn’t actually what I did, since there was one problem. Computing the longest increasing subsequence of a length ${N}$ permutation takes ${O(N \log N)}$ time, and there are ${N!}$ or so permutations. But when ${N = 3+4+3+4=14}$, we have ${N! \cdot N \log N \approx 3 \cdot 10^{12}}$, which is a pretty big number. Unfortunately, my computer is not really that fast, and I didn’t really have the patience to implement the “correct” algorithms to bring the runtime down.

The solution? Use ${N = 1+4+3+2 = 10}$ instead.

In a deep irony that I didn’t realize at the time, it was this moment when I introduced the ${\mathcal L_{k+2}(a_1, \dots, a_n)}$ notation, and for the first time allowed the ${a_i}$ to not be in ${\{k,k+1\}}$. My reasoning was that since I was only doing this for heuristic reasons, I could instead work with ${S = \mathcal L_{k+2}(2,4,3,2)}$ and probably not change much about the structure of the problem, while replacing ${N = 2 + 4 + 3 + 2 = 11}$, which would run ${1000}$ times faster. This was okay since all I wanted to do was see how much changing the “middle” would disrupt the structure.

And so the new plan was:

• Enumerate all the permutations in ${S = \mathcal L_{k+2}(1,4,3,2)}$.
• Enumerate all the permutations in ${A = \mathcal L_{k+2}(1,3,3,2)}$.
• Enumerate all the permutations in ${B = \mathcal L_{k+2}(1,3,4,2)}$.

I admit I never actually ran the enumeration with ${A}$, because the route with ${S}$ and ${B}$ turned out to be even more promising than I expected. When I compared the empirical data for the sets ${S}$ and ${B}$, I found that the number of permutations with any particular triple ${(\pi_1, \pi_9, \pi_{10})}$ were equal. In other words, the outer blocks were preserved: the bijection

$\displaystyle \mathcal L_{k+2}(1,4,3,2) \rightarrow \mathcal L_{k+2}(1,3,4,2)$

does not tamper with the outside blocks of length ${1}$ and ${2}$.

This meant I was ready to make the following conjecture. Suppose ${a_i = k}$, ${a_{i+1} = k+1}$. There is a bijection

$\displaystyle \mathcal L_{k+2}(a_1, \dots, a_i, a_{i+1}, \dots, a_n) \rightarrow \mathcal L_{k+2}(a_1, \dots, a_{i+1}, a_{i}, \dots, a_n)$

which only involves rearranging the elements of the ${i}$th and ${(i+1)}$st blocks.

## 4. Rooting out the bijection

At this point I was in a quite good position. I had pinned down the problem to a finding a particular bijection that I was confident had to exist, since it was showing up to the empirical detail.

Let’s call this mythical bijection ${\mathbf W}$. How could I figure out what it was?

### 4.1. Hunch: ${\mathbf W}$ preserves order-isomorphism

Let me quickly introduce a definition.

Definition 9

We say two words ${a_1 \dots a_m}$ and ${b_1 \dots b_m}$ are order-isomorphic if ${a_i < a_j}$ if and only ${b_i < b_j}$. Then order-isomorphism gives equivalence classes, and there is a canonical representative where the letters are ${\{1,2,\dots,m\}}$; this is called a reduced word.

Example 10

The words ${13957}$, ${12846}$ and ${12534}$ are order-isomorphic; the last is reduced.

Now I guessed one more property of ${\mathbf W}$: this ${\mathbf W}$ should order-isomorphism.

What do I mean by this? Suppose in one context ${139 | 57}$ changed to ${39 | 157}$; then we would expect that in another situation we should have ${124 | 68}$ changing to ${24 | 168}$. Indeed, we expect ${\mathbf W}$ (empirically) to not touch surrounding outside blocks, and so it would be very strange if ${\mathbf W}$ behaved differently due to far-away numbers it wasn’t even touching.

So actually I’ll just write

$\displaystyle \mathbf W(123|45) = 23|145$

for this example, reducing the words in question.

### 4.2. Keep cheating

With this hunch it’s possible to cheat with C++ again. Here’s how.

Let’s for concreteness suppose ${k=2}$ and the particular sets

$\displaystyle \mathcal L_{k+2}(1,3,2,1) \rightarrow \mathcal L_{k+2}(1,2,3,1).$

Well, it turns out if you look at the data:

• The only element of ${\mathcal L_{k+2}(1,3,2,1)}$ which starts with ${2}$ and ends with ${5}$ is ${2|147|36|5}$.
• The only element of ${\mathcal L_{k+2}(1,2,3,1)}$ which starts with ${2}$ and ends with ${5}$ is ${2|47|136|5}$.

So that means that ${147 | 36}$ is changed to ${47 | 136}$. Thus the empirical data shows that

$\displaystyle \mathbf W(135|24) = 35|124.$

In general, it might not be that clear cut. For example, if we look at the permutations starting with ${2}$ and ${4}$, there is more than one.

• ${2 | 1 5 7 | 3 6 | 4}$ and ${2 | 1 6 7 | 3 5 | 4}$ are both in ${\mathcal L_{k+2}(1,3,2,1)}$.
• ${2 | 5 7 | 1 3 6 | 4}$ and ${2 | 6 7 | 1 3 5 | 4}$ are both in in ${\mathcal L_{k+2}(1,2,3,1)}$.

Thus

$\displaystyle \mathbf W( \{135|24, 145|23\} ) = \{35|124, 45|123\}$

but we can’t tell which one goes to which (although you might be able to guess).

Fortunately, there is lots of data. This example narrowed ${135|24}$ down to two values, but if you look at other places you might have different data on ${135|24}$. Since we think ${\mathbf W}$ is behaving the same “globally”, we can piece together different pieces of data to get narrower sets. Even better, ${\mathbf W}$ is a bijection, so once we match either of ${135|24}$ or ${145|23}$, we’ve matched the other.

You know what this sounds like? Perfect matchings.

So here’s the experimental procedure.

• Enumerate all permutations in ${\mathcal L_{k+2}(2,3,4,2)}$ and ${\mathcal L_{k+2}(2,4,3,2)}$.
• Take each possible tuple ${(\pi_1, \pi_2, \pi_{10}, \pi_{11})}$, and look at the permutations that start and end with those particular four elements. Record the reductions of ${\pi_3\pi_4\pi_5|\pi_6\pi_7\pi_8\pi_9}$ and ${\pi_3\pi_4\pi_5\pi_6|\pi_7\pi_8\pi_9}$ for all these permutations. We call these input words and output words, respectively. Each output word is a “candidate” of ${\mathbf W}$ for a input word.
• For each input word ${a_1a_2a_3|b_1b_2b_3b_4}$ that appeared, take the intersection of all output words that appeared. This gives a bipartite graph ${G}$, with input words being matched to their candidates.
• Find perfect matchings of the graph.

And with any luck that would tell us what ${\mathbf W}$ is.

### 4.3. Results

Luckily, the bipartite graph is quite sparse, and there was only one perfect matching.

246|1357 => 2467|135
247|1356 => 2457|136
256|1347 => 2567|134
257|1346 => 2357|146
267|1345 => 2367|145
346|1257 => 3467|125
347|1256 => 3457|126
356|1247 => 3567|124
357|1246 => 1357|246
367|1245 => 1367|245
456|1237 => 4567|123
457|1236 => 1457|236
467|1235 => 1467|235
567|1234 => 1567|234


If you look at the data, well, there are some clear patterns. Exactly one number is “moving” over from the right half, each time. Also, if ${7}$ is on the right half, then it always moves over.

Anyways, if you stare at this for an hour, you can actually figure out the exact rule:

Claim 11

Given an input ${a_1a_2a_3|b_1b_2b_3b_4}$, move ${b_{i+1}}$ if ${i}$ is the largest index for which ${a_i < b_{i+1}}$, or ${b_1 = 1}$ if no such index exists.

And indeed, once I have this bijection, it takes maybe only another hour of thinking to verify that this bijection works as advertised, thus solving the original problem.

Rather than writing up what I had found, I celebrated that Sunday evening by playing Wesnoth for 2.5 hours.

## 5. Generalization

### 5.1. Surprise

On Monday morning I was mindlessly feeding inputs to the program I had worked on earlier and finally noticed that in fact ${\mathcal L_6(1,3,5,2)}$ and ${\mathcal L_6(1,5,3,2)}$ also had the same cardinality. Huh.

It seemed too good to be true, but I played around some more, and sure enough, the cardinality of ${\#\mathcal L_{k+2}(a_1, \dots, a_n)}$ seemed to only depend on the order of the ${a_i}$‘s. And so at last I stumbled upon the final form the conjecture, realizing that all along the assumption ${a_i \in \{k,k+1\}}$ that I had been working with was a red herring, and that the bijection was really true in much vaster generality. There is a bijection

$\displaystyle \mathcal L_{k+2}(a_1, \dots, a_i, a_{i+1}, \dots, a_n) \rightarrow \mathcal L_{k+2}(a_1, \dots, a_{i+1}, a_{i}, \dots, a_n)$

which only involves rearranging the elements of the ${i}$th and ${(i+1)}$st blocks.

It also meant I had more work to do, and so I was now glad that I hadn’t written up my work from yesterday night.

### 5.2. More data science

I re-ran the experiment I had done before, now with ${\mathcal L_7(2,3,5,2) \rightarrow \mathcal L_7(2,5,3,2)}$. (This was interesting, because the ${8}$ elements in question could now have either longest increasing subsequence of length ${5}$, or instead of length ${6}$.)

The data I obtained was:

246|13578 => 24678|135
247|13568 => 24578|136
248|13567 => 24568|137
256|13478 => 25678|134
257|13468 => 23578|146
258|13467 => 23568|147
267|13458 => 23678|145
268|13457 => 23468|157
278|13456 => 23478|156
346|12578 => 34678|125
347|12568 => 34578|126
348|12567 => 34568|127
356|12478 => 35678|124
357|12468 => 13578|246
358|12467 => 13568|247
367|12458 => 13678|245
368|12457 => 13468|257
378|12456 => 13478|256
456|12378 => 45678|123
457|12368 => 14578|236
458|12367 => 14568|237
467|12358 => 14678|235
468|12357 => 12468|357
478|12356 => 12478|356
567|12348 => 15678|234
568|12347 => 12568|347
578|12346 => 12578|346
678|12345 => 12678|345


Okay, so it looks like:

• exactly two numbers are moving each time, and
• the length of the longest run is preserved.

Eventually, I was able to work out the details, but they’re more involved than I want to reproduce here. But the idea is that you can move elements “one at a time”: something like

$\displaystyle \mathcal L_{k+2}(7,4) \rightarrow \mathcal L_{k+2}(6,5) \rightarrow \mathcal L_{k+2}(5,6) \rightarrow \mathcal L_{k+2}(4,7)$

while preserving the length of increasing subsequences at each step.

So, together with the easy observation from the beginning, this not only resolves the original problem, but also gives an elegant generalization. I had now proved:

Theorem 12

For any ${a_1}$, …, ${a_n}$, the cardinality

$\displaystyle \# \mathcal L_{k+2}(a_1, \dots, a_n)$

does not depend on the order of the ${a_i}$‘s.

## 6. Discovered vs invented

Whenever I look back on this, I can’t help thinking just how incredibly lucky I got on this project.

There’s this perpetual debate about whether mathematics is discovered or invented. I think it’s results like this which make the case for “discovered”. I did not really construct the bijection ${\mathbf W}$ myself: it was “already there” and I found it by examining the data. In another world where ${\mathbf W}$ did not exist, all the creativity in the world wouldn’t have changed anything.

So anyways, that’s the behind-the-scenes tour of my favorite combinatorics paper.

# Joyal’s Proof of Cayley’s Tree Formula

I wanted to quickly write this proof up, complete with pictures, so that I won’t forget it again. In this post I’ll give a combinatorial proof (due to Joyal) of the following:

Theorem 1 (Cayley’s Formula)

The number of trees on ${n}$ labelled vertices is ${n^{n-2}}$.

Proof: We are going to construct a bijection between

• Functions ${\{1, 2, \dots, n\} \rightarrow \{1, 2, \dots, n\}}$ (of which there are ${n^n}$) and
• Trees on ${\{1, 2, \dots, n\}}$ with two distinguished nodes ${A}$ and ${B}$ (possibly ${A=B}$).

Let’s look at the first piece of data. We can visualize it as ${n}$ points floating around, each with an arrow going out of it pointing to another point, but possibly with many other arrows coming into it. Such a structure is apparently called a directed pseudoforest. Here is an example when ${n = 9}$.

You’ll notice that in each component, some of the points lie in a cycle and others do not. I’ve colored the former type of points blue, and the corresponding arrows magenta.

Thus a directed pseudoforest can also be specified by

• a choice of some vertices to be in cycles (blue vertices),
• a permutation on the blue vertices (magenta arrows), and
• attachments of trees to the blue vertices (grey vertices and arrows).

Now suppose we take the same information, but replace the permutation on the blue vertices with a total ordering instead (of course there are an equal number of these). Then we can string the blue vertices together as shown below, where the green arrows denote the selected total ordering (in this case ${1 < 9 < 2 < 4 < 8 < 5}$):

This is exactly the data of a tree on the ${n}$ vertices with two distinguished vertices, the first and last in the chain of green (which could possibly coincide). $\Box$

I’m reading through Primes of the Form ${x^2+ny^2}$, by David Cox (link; it’s good!). Here are the high-level notes I took on the first chapter, which is about the theory of quadratic forms.

(Meta point re blog: I’m probably going to start posting more and more of these more high-level notes/sketches on this blog on topics that I’ve been just learning. Up til now I’ve been mostly only posting things that I understand well and for which I have a very polished exposition. But the perfect is the enemy of the good here; given that I’m taking these notes for my own sake, I may as well share them to help others.)

## 1. Overview

Definition 1

For us a quadratic form is a polynomial ${Q = Q(x,y) = ax^2 + bxy + cy^2}$, where ${a}$, ${b}$, ${c}$ are some integers. We say that it is primitive if ${\gcd(a,b,c) = 1}$.

For example, we have the famous quadratic form

$\displaystyle Q_{\text{Fermat}}(x,y) = x^2+y^2.$

As readers are probably aware, we can say a lot about exactly which integers can be represented by ${Q_{\text{Fermat}}}$: by Fermat’s Christmas theorem, the primes ${p \equiv 1 \pmod 4}$ (and ${p=2}$) can all be written as the sum of two squares, while the primes ${p \equiv 3 \pmod 4}$ cannot. For convenience, let us say that:

Definition 2

Let ${Q}$ be a quadratic form. We say it represents the integer ${m}$ if there exists ${x,y \in \mathbb Z}$ with ${m = Q(x,y)}$. Moreover, ${Q}$ properly represents ${m}$ if one can find such ${x}$ and ${y}$ which are also relatively prime.

The basic question is: what can we say about which primes/integers are properly represented by a quadratic form? In fact, we will later restrict our attention to “positive definite” forms (described later).

For example, Fermat’s Christmas theorem now rewrites as:

Theorem 3 (Fermat’s Christmas theorem for primes)

An odd prime ${p}$ is (properly) represented by ${Q_{\text{Fermat}}}$ if and only if ${p \equiv 1 \pmod 4}$.

The proof of this is classical, see for example my olympiad handout. We also have the formulation for odd integers:

Theorem 4 (Fermat’s Christmas theorem for odd integers)

An odd integer ${m}$ is properly represented by ${Q_{\text{Fermat}}}$ if and only if all prime factors of ${m}$ are ${1 \pmod 4}$.

Proof: For the “if” direction, we use the fact that ${Q_{\text{Fermat}}}$ is multiplicative in the sense that

$\displaystyle (x^2+y^2)(u^2+v^2) = (xu \pm yv)^2 + (xv \mp yu)^2.$

For the “only if” part we use the fact that if a multiple of a prime ${p}$ is properly represented by ${Q_{\text{Fermat}}}$, then so is ${p}$. This follows by noticing that if ${x^2+y^2 \equiv 0 \pmod p}$ (and ${xy \not\equiv 0 \pmod p}$) then ${(x/y)^2 \equiv -1 \pmod p}$. $\Box$
Tangential remark: the two ideas in the proof will grow up in the following way.

• The fact that ${Q_{\text{Fermat}}}$ “multiplies nicely” will grow up to become the so-called composition of quadratic forms.
• The second fact will not generalize for an arbitrary form ${Q}$. Instead, we will see that if a multiple of ${p}$ is represented by a form ${Q}$ then some form of the same “discriminant” will represent the prime ${p}$, but this form need not be the same as ${Q}$ itself.

## 2. Equivalence of forms, and the discriminant

The first thing we should do is figure out when two forms are essentially the same: for example, ${x^2+5y^2}$ and ${5x^2+y^2}$ should clearly be considered the same. More generally, if we think of ${Q}$ as acting on ${\mathbb Z^{\oplus 2}}$ and ${T}$ is any automorphism of ${\mathbb Z^{\oplus 2}}$, then ${Q \circ T}$ should be considered the same as ${Q}$. Specifically,

Definition 5

Two forms ${Q_1}$ and ${Q_2}$ said to be equivalent if there exists

$\displaystyle T = \begin{pmatrix} p & q \\ r & s \end{pmatrix} \in \text{GL }(2,\mathbb Z)$

such that ${Q_2(x,y) = Q_1(px+ry, qx+sy)}$. We have ${\det T = ps-qr = \pm 1}$ and so we say the equivalence is

• a proper equivalence if ${\det T = +1}$, and
• an improper equivalence if ${\det T = -1}$.

So we generally will only care about forms up to proper equivalence. (It will be useful to distinguish between proper/improper equivalence later.)

Naturally we seek some invariants under this operation. By far the most important is:

Definition 6

The discriminant of a quadratic form ${Q = ax^2 + bxy + cy^2}$ is defined as

$\displaystyle D = b^2-4ac.$

The discriminant is invariant under equivalence (check this). Note also that we also have ${D \equiv 0 , 1 \pmod 4}$.

Observe that we have

$\displaystyle 4a \cdot (ax^2+bxy+cy^2) = (2ax + by)^2 - Dy^2.$

So if ${D < 0}$ and ${a > 0}$ (thus ${c > 0}$ too) then ${ax^2+bxy+cy^2 > 0}$ for all ${x,y > 0}$. Such quadratic forms are called positive definite, and we will restrict our attention to these forms.

Now that we have this invariant, we may as well classify equivalence classes of quadratic forms for a fixed discriminant. It turns out this can be done explicitly.

Definition 7

A quadratic form ${Q = ax^2 + bxy + cy^2}$ is reduced if

• it is primitive and positive definite,
• ${|b| \le a \le c}$, and
• ${b \ge 0}$ if either ${|b| = a}$ or ${a = c}$.

Exercise 8

Check there only finitely many reduced forms of a fixed discriminant.

Then the big huge theorem is:

Theorem 9 (Reduced forms give a set of representatives)

Every primitive positive definite form ${Q}$ of discriminant is properly equivalent to a unique reduced form. We call this the reduction of ${Q}$.

Proof: Omitted due to length, but completely elementary. It is a reduction argument with some number of cases. $\Box$

Thus, for any discriminant ${D}$ we can consider the set

$\displaystyle \text{Cl}(D) = \left\{ \text{reduced forms of discriminant } D \right\}$

which will be the equivalence classes of positive definite of discriminant ${D}$. By abuse of notation we will also consider it as the set of equivalence classes of primitive positive definite forms of discriminant ${D}$.

We also define ${h(D) = \left\lvert \text{Cl}(D) \right\rvert}$; by the exercise, ${h(D) < \infty}$. This is called the class number.

Moreover, we have ${h(D) \ge 1}$, because we can take ${x^2 - D/4 y^2}$ for ${D \equiv 0 \pmod 4}$ and ${x^2 + xy + (1-D)/4 y^2}$ for ${D \equiv 1 \pmod 4}$. We call this form the principal form.

## 3. Tables of quadratic forms

Example 10 (Examples of quadratic forms with ${h(D) = 1}$, ${D \equiv 0 \pmod 4}$)

The following discriminants have class number ${h(D) = 1}$, hence having only the principal form:

• ${D = -4}$, with form ${x^2 + y^2}$.
• ${D = -8}$, with form ${x^2 + 2y^2}$.
• ${D = -12}$, with form ${x^2+3y^2}$.
• ${D = -16}$, with form ${x^2 + 4y^2}$.
• ${D = -28}$, with form ${x^2 + 7y^2}$.

This is in fact the complete list when ${D \equiv 0 \pmod 4}$.

Example 11 (Examples of quadratic forms with ${h(D) = 1}$, ${D \equiv 1 \pmod 4}$)

The following discriminants have class number ${h(D) = 1}$, hence having only the principal form:

• ${D = -3}$, with form ${x^2 + xy + y^2}$.
• ${D = -7}$, with form ${x^2 + xy + 2y^2}$.
• ${D = -11}$, with form ${x^2 + xy + 3y^2}$.
• ${D = -19}$, with form ${x^2 + xy + 5y^2}$.
• ${D = -27}$, with form ${x^2 + xy + 7y^2}$.
• ${D = -43}$, with form ${x^2 + xy + 11y^2}$.
• ${D = -67}$, with form ${x^2 + xy + 17y^2}$.
• ${D = -163}$, with form ${x^2 + xy + 41y^2}$.

This is in fact the complete list when ${D \equiv 1 \pmod 4}$.

Example 12 (More examples of quadratic forms)

Here are tables for small discriminants with ${h(D) > 1}$. When ${D \equiv 0 \pmod 4}$ we have

• ${D = -20}$, with ${h(D) = 2}$ forms ${2x^2 + 2xy + 3y^2}$ and ${x^2 + 5y^2}$.
• ${D = -24}$, with ${h(D) = 2}$ forms ${2x^2 + 3y^2}$ and ${x^2 + 6y^2}$.
• ${D = -32}$, with ${h(D) = 2}$ forms ${3x^2 + 2xy + 3y^2}$ and ${x^2 + 8y^2}$.
• ${D = -36}$, with ${h(D) = 2}$ forms ${2x^2 + 2xy + 5y^2}$ and ${x^2 + 9y^2}$.
• ${D = -40}$, with ${h(D) = 2}$ forms ${2x^2 + 5y^2}$ and ${x^2 + 10y^2}$.
• ${D = -44}$, with ${h(D) = 3}$ forms ${3x^2 \pm 2xy + 4y^2}$ and ${x^2 + 11y^2}$.

As for ${D \equiv 1 \pmod 4}$ we have

• ${D = -15}$, with ${h(D) = 2}$ forms ${2x^2 + xy + 2y^2}$ and ${x^2 + xy + 4y^2}$.
• ${D = -23}$, with ${h(D) = 3}$ forms ${2x^2 \pm xy + 3y^2}$ and ${x^2+ xy + 6y^2}$.
• ${D = -31}$, with ${h(D) = 3}$ forms ${2x^2 \pm xy + 4}$ and ${x^2 + xy + 8y^2}$.
• ${D = -39}$, with ${h(D) = 4}$ forms ${3x^2 + 3xy + 4y^2}$, ${2x^2 \pm 2xy + 5y^2}$ and ${x^2 + xy + 10y^2}$.

Example 13 (Even More Examples of quadratic forms)

Here are some more selected examples:

• ${D = -56}$ has ${h(D) = 4}$ forms ${x^2+14y^2}$, ${2x^2+7y^2}$ and ${3x^2 \pm 2xy + 5y^2}$.
• ${D = -108}$ has ${h(D) = 3}$ forms ${x^2+27y^2}$ and ${4x^2 \pm 2xy + 7y^2}$.
• ${D = -256}$ has ${h(D) = 4}$ forms ${x^2+64y^2}$, ${4x^2+4xy+17y^2}$ and ${5x^2\pm2xy+13y^2}$.

## 4. The Character ${\chi_D}$

We can now connect this to primes ${p}$ as follows. Earlier we played with ${Q_{\text{Fermat}} = x^2+y^2}$, and observed that for odd primes ${p}$, ${p \equiv 1 \pmod 4}$ if and only if some multiple of ${p}$ is properly represented by ${Q_{\text{Fermat}}}$.

Our generalization is as follows:

Theorem 14 (Primes represented by some quadratic form)

Let ${D < 0}$ be a discriminant, and let ${p \nmid D}$ be an odd prime. Then the following are equivalent:

• ${\left( \frac Dp \right) = 1}$, i.e. ${D}$ is a quadratic residue modulo ${p}$.
• The prime ${p}$ is (properly) represented by some reduced quadratic form in ${\text{Cl}(D)}$.

This generalizes our result for ${Q_{\text{Fermat}}}$, but note that it uses ${h(-4) = 1}$ in an essential way! That is: if ${(-1/p) = 1}$, we know ${p}$ is represented by some quadratic form of discriminant ${D = -4}$\dots but only since ${h(-4) = 1}$ do we know that this form reduces to ${Q_{\text{Fermat}} = x^2+y^2}$.

Proof: First assume WLOG that ${p \nmid 4a}$ and ${Q(x,y) \equiv 0 \pmod p}$. Thus ${p \nmid y}$, since otherwise this would imply ${x \equiv y \equiv 0 \pmod p}$. Then

$\displaystyle 0 \equiv 4a \cdot Q(x,y) \equiv (2ax + by)^2 - Dy^2 \pmod p$

hence ${D \equiv \left( 2axy^{-1} + b \right)^2 \pmod p}$.

The converse direction is amusing: let ${m^2 = D + pk}$ for integers ${m}$, ${k}$. Consider the quadratic form

$\displaystyle Q(x,y) = px^2 + mxy + ky^2.$

It is primitive of discriminant ${D}$ and ${Q(1,0) = p}$. Now ${Q}$ may not be reduced, but that’s fine: just take the reduction of ${Q}$, which must also properly represent ${p}$. $\Box$

Thus to every discriminant ${D < 0}$ we can attach the Legendre character (is that the name?), which is a homomorphism

$\displaystyle \chi_D = \left( \tfrac{D}{\bullet} \right) : \left( \mathbb Z / D\mathbb Z \right)^\times \rightarrow \{ \pm 1 \}$

with the property that if ${p}$ is a rational prime not dividing ${D}$, then ${\chi_D(p) = \left( \frac{D}{p} \right)}$. This is abuse of notation since I should technically write ${\chi_D(p \pmod D)}$, but there is no harm done: one can check by quadratic reciprocity that if ${p \equiv q \pmod D}$ then ${\chi_D(p) = \chi_D(q)}$. Thus our previous result becomes:

Theorem 15 (${\ker(\chi_D)}$ consists of representable primes)

Let ${p \nmid D}$ be prime. Then ${p \in \ker(\chi_D)}$ if and only if some quadratic form in ${\text{Cl}(D)}$ represents ${p}$.

As a corollary of this, using the fact that ${h(-8) = h(-12) = h(-28) = 1}$ one can prove that

Corollary 16 (Fermat-type results for ${h(-4n) = 1}$)

Let ${p > 7}$ be a prime. Then ${p}$ is

• of the form ${x^2 + 2y^2}$ if and only if ${p \equiv 1, 3 \pmod 8}$.
• of the form ${x^2 + 3y^2}$ if and only if ${p \equiv 1 \pmod 3}$.
• of the form ${x^2 + 7y^2}$ if and only if ${p \equiv 1, 2, 4 \pmod 7}$.

Proof: The congruence conditions are equivalent to ${(-4n/p) = 1}$, and as before the only point is that the only reduced quadratic form for these ${D = -4n}$ is the principal one. $\Box$

## 5. Genus theory

What if ${h(D) > 1}$? Sometimes, we can still figure out which primes go where just by taking mods.

Let ${Q \in \text{Cl}(D)}$. Then it represents some residue classes of ${(\mathbb Z/D\mathbb Z)^\times}$. In that case we call the set of residue classes represented the genus of the quadratic form ${Q}$.

Example 17 (Genus theory of ${D = -20}$)

Consider ${D = -20}$, with

$\displaystyle \ker(\chi_D) = \left\{ 1, 3, 7, 9 \right\} \subseteq (\mathbb Z/D\mathbb Z)^\times.$

We consider the two elements of ${\text{Cl}(D)}$:

• ${x^2 + 5y^2}$ represents ${1, 9 \in (\mathbb Z/20\mathbb Z)^\times}$.
• ${2x^2+2xy+3y^2}$ represents ${3, 7 \in (\mathbb Z/20\mathbb Z)^\times}$.

Now suppose for example that ${p \equiv 9 \pmod{20}}$. It must be represented by one of these two quadratic forms, but the latter form is never ${9 \pmod{20}}$ and so it must be the first one. Thus we conclude that

• ${p = x^2+5y^2}$ if and only if ${p \equiv 1, 9 \pmod{20}}$.
• ${p = 2x^2 + 2xy + 3y^2}$ if and only if ${p \equiv 3, 7 \pmod{20}}$.

The thing that makes this work is that each genus appears exactly once. We are not always so lucky: for example when ${D = -108}$ we have that

Example 18 (Genus theory of ${D = -108}$)

The two elements of ${\text{Cl}(-108)}$ are:

• ${x^2+27y^2}$, which represents exactly the ${1 \pmod 3}$ elements of ${(\mathbb Z/D\mathbb Z)^\times}$.
• ${4x^2 \pm 2xy + 7y^2}$, which also represents exactly the ${1 \pmod 3}$ elements of ${(\mathbb Z/D\mathbb Z)^\times}$.

So the best we can conclude is that ${p = x^2+27y^2}$ OR ${p = 4x^2\pm2xy+7y^2}$ if and only if ${p \equiv 1 \pmod 3}$ This is because the two distinct quadratic forms of discriminant ${-108}$ happen to have the same genus.

We now prove that:

Theorem 19 (Genii are cosets of ${\ker(\chi_D)}$)

Let ${D}$ be a discriminant and consider the Legendre character ${\chi_D}$.

• The genus of the principal form of discriminant ${D}$ constitutes a subgroup ${H}$ of ${\ker(\chi_D)}$, which we call the principal genus.
• Any genus of a quadratic form in ${\text{Cl}(D)}$ is a coset of the principal genus ${H}$ in ${\ker(\chi_D)}$.

Proof: For the first part, we aim to show ${H}$ is multiplicatively closed. For ${D \equiv 0 \pmod 4}$, ${D = -4n}$ we use the fact that

$\displaystyle (x^2+ny^2)(u^2+nv^2) = (xu \pm nyv)^2 + n(xv \mp yu)^2.$

For ${D \equiv 1 \pmod 4}$, we instead appeal to another “magic” identity

$\displaystyle 4\left( x^2+xy+\frac{1-D}{4}y^2 \right) \equiv (2x+y)^2 \pmod D$

and it follows from here that ${H}$ is actually the set of squares in ${(\mathbb Z/D\mathbb Z)^\times}$, which is obviously a subgroup.

Now we show that other quadratic forms have genus equal to a coset of the principal genus. For ${D \equiv 0 \pmod 4}$, with ${D = -4n}$ we can write

$\displaystyle a(ax^2+bxy+cy^2) = (ax+b/2 y)^2 + ny^2$

and thus the desired coset is shown to be ${a^{-1} H}$. As for ${D \equiv 1 \pmod 4}$, we have

$\displaystyle 4a \cdot (ax^2+bxy+cy^2) = (2ax + by)^2 - Dy^2 \equiv (2ax+by)^2 \pmod D$

so the desired coset is also ${a^{-1} H}$, since ${H}$ was the set of squares. $\Box$

Thus every genus is a coset of ${H}$ in ${\ker(\chi_D)}$. Thus:

Definition 20

We define the quotient group

$\displaystyle \text{Gen}(D) = \ker(\chi_D) / H$

which is the set of all genuses in discriminant ${D}$. One can view this as an abelian group by coset multiplication.

Thus there is a natural map

$\displaystyle \Phi_D : \text{Cl}(D) \twoheadrightarrow \text{Gen}(D).$

(The map is surjective by Theorem~14.) We also remark than ${\text{Gen}(D)}$ is quite well-behaved:

Proposition 21 (Structure of ${\text{Gen}(D)}$)

The group ${\text{Gen}(D)}$ is isomorphic to ${(\mathbb Z/2\mathbb Z)^{\oplus m}}$ for some integer ${m}$.

Proof: Observe that ${H}$ contains all the squares of ${\ker(\chi_D)}$: if ${f}$ is the principal form then ${f(t,0) = t^2}$. Thus claim each element of ${\text{Gen}(D)}$ has order at most ${2}$, which implies the result since ${\text{Gen}(D)}$ is a finite abelian group. $\Box$

In fact, one can compute the order of ${\text{Gen}(D)}$ exactly, but for this post I Will just state the result.

Theorem 22 (Order of ${\text{Gen}(D)}$)

Let ${D < 0}$ be a discriminant, and let ${r}$ be the number of distinct odd primes which divide ${D}$. Define ${\mu}$ by:

• ${\mu = r}$ if ${D \equiv 1 \pmod 4}$.
• ${\mu = r}$ if ${D = -4n}$ and ${n \equiv 3 \pmod 4}$.
• ${\mu = r+1}$ if ${D = -4n}$ and ${n \equiv 1,2 \pmod 4}$.
• ${\mu = r+1}$ if ${D = -4n}$ and ${n \equiv 4 \pmod 8}$.
• ${\mu = r+2}$ if ${D = -4n}$ and ${n \equiv 0 \pmod 8}$.

Then ${\left\lvert \text{Gen}(D) \right\rvert = 2^{\mu-1}}$.

## 6. Composition

We have already used once the nice identity

$\displaystyle (x^2+ny^2)(u^2+nv^2) = (xu \pm nyv)^2 + n(xv \mp yu)^2.$

We are going to try and generalize this for any two quadratic forms in ${\text{Cl}(D)}$. Specifically,

Proposition 23 (Composition defines a group operation)

Let ${f,g \in \text{Cl}(D)}$. Then there is a unique ${h \in \text{Cl}(D)}$ and bilinear forms ${B_i(x,y,z,w) = a_ixz + b_ixw + c_iyz + d_iyw}$ for ${i=1,2}$ such that

• ${f(x,y) g(z,w) = h(B_1(x,y,z,w), B_2(x,y,z,w))}$.
• ${a_1b_2 - a_2b_1 = +f(1,0)}$.
• ${a_1c_2 - a_2c_1 = +g(1,0)}$.

In fact, without the latter two constraints we would instead have ${a_1b_2 - a_2b_1 = \pm f(1,0)}$ and ${a_1c_2 - a_2c_1 = \pm g(1,0)}$, and each choice of signs would yield one of four (possibly different) forms. So requiring both signs to be positive makes this operation well-defined. (This is why we like proper equivalence; it gives us a well-defined group structure, whereas with improper equivalence it would be impossible to put a group structure on the forms above.)

Taking this for granted, we then have that

Theorem 24 (Form class group)

Let ${D \equiv 0, 1 \pmod 4}$, ${D < 0}$ be a discriminant. Then ${\text{Cl}(D)}$ becomes an abelian group under composition, where

• The identity of ${\text{Cl}(D)}$ is the principal form, and
• The inverse of the form ${ax^2+bxy+cy^2}$ is ${ax^2-bxy+cy^2}$.

This group is called the form class group.

We then have a group homomorphism

$\displaystyle \Phi_D : \text{Cl}(D) \twoheadrightarrow \text{Gen}(D).$

Observe that ${ax^2 + bxy + cy^2}$ and ${ax^2 - bxy + cy^2}$ are inverses and that their ${\Phi_D}$ images coincide (being improperly equivalent); this is expressed in the fact that ${\text{Gen}(D)}$ has elements of order ${\le 2}$. As another corollary, the number of elements of ${\text{Cl}(D)}$ with a given genus is always a power of two.

We now define:

Definition 25

An integer ${n \ge 1}$ is convenient if the following equivalent conditions hold:

• The principal form ${x^2+ny^2}$ is the only reduced form with the principal genus.
• ${\Phi_D}$ is injective (hence an isomorphism).
• ${\left\lvert h(D) \right\rvert = 2^{\mu-1}}$.

Thus we arrive at the following corollary:

Corollary 26 (Convenient numbers have nice representations)

Let ${n \ge 1}$ be convenient. Then ${p}$ is of the form ${x^2+ny^2}$ if and only if ${p}$ lies in the principal genus.

Hence the represent-ability depends only on ${p \pmod{4n}}$.

OEIS A000926 lists 65 convenient numbers. This sequence is known to be complete except for at most one more number; moreover the list is complete assuming the Grand Riemann Hypothesis.

## 7. Cubic and quartic reciprocity

To treat the cases where ${n}$ is not convenient, the correct thing to do is develop class field theory. However, we can still make a little bit more progress if we bring higher reciprocity theorems to bear: we’ll handle the cases ${n=27}$ and ${n=64}$, two examples of numbers which are not convenient.

### 7.1. Cubic reciprocity

First, we prove that

Theorem 27 (On ${p = x^2+27y^2}$)

A prime ${p > 3}$ is of the form ${x^2+27y^2}$ if and only if ${p \equiv 1 \pmod 3}$ and ${2}$ is a cubic residue modulo ${p}$.

To do this we use cubic reciprocity, which requires working in the Eisenstein integers ${\mathbb Z[\omega]}$ where ${\omega}$ is a cube root of unity. There are six units in ${\mathbb Z[\omega]}$ (the sixth roots of unity), hence each nonzero number has six associates (differing by a unit), and the ring is in fact a PID.

Now if we let ${\pi}$ be a prime not dividing ${3}$, and ${\alpha}$ is coprime to ${\pi}$, then we can define the cubic Legendre symbol by setting

$\displaystyle \left( \frac{\alpha}{\pi} \right)_3 \equiv \alpha^{\frac13(N\pi-1)} \pmod \pi \in \left\{ 1, \omega, \omega^2 \right\}.$

Moreover, we can define a primary prime ${\pi \nmid 3}$ to be one such that ${\pi \equiv -1 \pmod 3}$; given any prime exactly one of the six associates is primary. We then have the following reciprocity theorem:

Theorem 28 (Cubic reciprocity)

If ${\pi}$ and ${\theta}$ are disjoint primary primes in ${\mathbb Z[\omega]}$ then

$\displaystyle \left( \frac{\pi}{\theta} \right)_3 = \left( \frac{\theta}{\pi} \right)_3.$

We also have the following supplementary laws: if ${\pi = (3m-1) + 3n\omega}$, then

$\displaystyle \left( \frac{\omega}{\pi} \right)_3 = \omega^{m+n} \qquad\text{and}\qquad \left( \frac{1-\omega}{\pi} \right)_3 = \omega^{2m}.$

The first supplementary law is for the unit (analogous to ${(-1/p)}$) while the second reciprocity law handles the prime divisors of ${3 = -\omega^2(1-\omega)^2}$ (analogous to ${(2/p)}$.)

We can tie this back into ${\mathbb Z}$ as follows. If ${p \equiv 1 \pmod 3}$ is a rational prime then it is represented by ${x^2+xy+y^2}$, and thus we can put ${p = \pi \overline{\pi}}$ for some prime ${\pi}$, ${N(\pi) = p}$. Consequently, we have a natural isomorphism

$\displaystyle \mathbb Z[\omega] / \pi \mathbb Z[\omega] \cong \mathbb Z / p \mathbb Z.$

Therefore, we see that a given ${a \in (\mathbb Z/p\mathbb Z)^\times}$ is a cubic residue if and only if ${(\alpha/\pi)_3 = 1}$.

In particular, we have the following corollary, which is all we will need:

Corollary 29 (When ${2}$ is a cubic residue)

Let ${p \equiv 1 \pmod 3}$ be a rational prime, ${p > 3}$. Write ${p = \pi \overline{\pi}}$ with ${\pi}$ primary. Then ${2}$ is a cubic residue modulo ${p}$ if and only if ${\pi \equiv 1 \pmod 2}$.

Proof: By cubic reciprocity:

$\displaystyle \left( \frac{2}{\pi} \right)_3 = \left( \frac{\pi}{2} \right)_3 \equiv \pi^{\frac13(N2-1)} \equiv \pi \pmod 2.$

$\Box$

Now we give the proof of Theorem~27. Proof: First assume

$\displaystyle p = x^2+27y^2 = \left( x+3\sqrt 3 y \right)\left( x-3\sqrt 3 y \right).$

Let ${\pi = x + 3 \sqrt{-3} y = (x+3y) + 6y\omega}$ be primary, noting that ${\pi \equiv 1 \pmod 2}$. Now clearly ${p \equiv 1 \pmod 3}$, so done by corollary.

For the converse, assume ${p \equiv 1 \pmod 3}$, ${p = \pi \overline{\pi}}$ with ${\pi}$ primary and ${\pi \equiv 1 \pmod 2}$. If we set ${\pi = a + b\omega}$ for integers ${a}$ and ${b}$, then the fact that ${\pi \equiv 1 \pmod 2}$ and ${\pi \equiv -1 \pmod 3}$ is enough to imply that ${6 \mid b}$ (check it!). Moreover,

$\displaystyle p = a^2-ab+b^2 = \left( a - \frac{1}{2} b \right)^2 + 27 \left( \frac16b \right)^2$

as desired. $\Box$

### 7.2. Quartic reciprocity

This time we work in ${\mathbb Z[i]}$, for which there are four units ${\pm 1}$, ${\pm i}$. A prime is primary if ${\pi \equiv 1 \pmod{2+2i}}$; every prime not dividing ${2 = -i(1+i)^2}$ has a unique associate which is primary. Then we can as before define

$\displaystyle \alpha^{\frac14(N\pi-1)} \equiv \left( \frac{\alpha}{\pi} \right)_4 \pmod{\pi} \in \left\{ \pm 1, \pm i \right\}$

where ${\pi}$ is primary, and ${\alpha}$ is nonzero mod ${\pi}$. As before ${p \equiv 1 \pmod 4}$, ${p = \pi\overline{\pi}}$ we have that ${a}$ is a quartic residue modulo ${p}$ if and only if ${\left( a/\pi \right)_4 = 1}$ thanks to the isomorphism

$\displaystyle \mathbb Z[i] / \pi \mathbb Z[i] \cong \mathbb Z / p \mathbb Z.$

Now we have

Theorem 30 (Quartic reciprocity)

If ${\pi}$ and ${\theta}$ are distinct primary primes in ${\mathbb Z[i]}$ then

$\displaystyle \left( \frac{\theta}{\pi} \right)_4 = \left( \frac{\pi}{\theta} \right)_4 (-1)^{\frac{1}{16}(N\theta-1)(N\pi-1)}.$

We also have supplementary laws that state that if ${\pi = a+bi}$ is primary, then

$\displaystyle \left( \frac{i}{\pi} \right)_4 = i^{-\frac{1}{2}(a-1)} \qquad\text{and}\qquad \left( \frac{1+i}{\pi} \right)_4 = i^{\frac14(a-b-b^2-1)}.$

Again, the first law handles units, and the second law handles the prime divisors of ${2}$. The corollary we care about this time in fact uses only the supplemental laws:

Corollary 31 (When ${2}$ is a quartic residue)

Let ${p \equiv 1 \pmod 4}$ be a prime, and put ${p = \pi\overline{\pi}}$ with ${\pi = a+bi}$ primary. Then

$\displaystyle \left( \frac{2}{\pi} \right)_4 = i^{-b/2}$

and in particular ${2}$ is a quartic residue modulo ${p}$ if and only if ${b \equiv 0 \pmod 8}$.

Proof: Note that ${2 = i^3(1+i)^2}$ and applying the above. Therefore

$\displaystyle \left( \frac{2}{\pi} \right)_4 = \left( \frac{i}{\pi} \right)_4^3 \left( \frac{1+i}{\pi} \right)_4^2 = i^{-\frac32(a-1)} \cdot i^{\frac12(a-b-b^2-1)} = i^{-(a-1) - \frac{1}{2} b(b+1)}.$

Now we assumed ${a+bi}$ is primary. We claim that

$\displaystyle a - 1 + \frac{1}{2} b^2 \equiv 0 \pmod 4.$

Note that since ${(a+bi)-1}$ was is divisible by ${2+2i}$, hence ${N(2+2i)=8}$ divides ${(a-1)^2+b^2}$. Thus

$\displaystyle 2(a-1) + b^2 \equiv 2(a-1) + (a-1)^2 \equiv (a-1)(a-3) \equiv 0 \pmod 8$

since ${a}$ is odd and ${b}$ is even. Finally,

$\displaystyle \left( \frac{2}{\pi} \right)_4 = i^{-(a-1) - \frac{1}{2} b(b+1)} = i^{-\frac{1}{2} b + (a-1+\frac{1}{2} b^2)} \equiv i^{-\frac{1}{2} b} \pmod p.$

$\Box$

From here we quickly deduce

Theorem 32 (On ${p = x^2+64y^2}$)

If ${p > 2}$ is prime, then ${p = x^2+64y^2}$ if and only if ${p \equiv 1 \pmod 4}$ and ${2}$ is a quartic residue modulo ${p}$.

# Some Thoughts on Olympiad Material Design

(This is a bit of a follow-up to the solution reading post last month. Spoiler warnings: USAMO 2014/6, USAMO 2012/2, TSTST 2016/4, and hints for ELMO 2013/1, IMO 2016/2.)

I want to say a little about the process which I use to design my olympiad handouts and classes these days (and thus by extension the way I personally think about problems). The short summary is that my teaching style is centered around showing connections and recurring themes between problems.

Now let me explain this in more detail.

## 1. Main ideas

Solutions to olympiad problems can look quite different from one another at a surface level, but typically they center around one or two main ideas, as I describe in my post on reading solutions. Because details are easy to work out once you have the main idea, as far as learning is concerned you can more or less throw away the details and pay most of your attention to main ideas.

Thus whenever I solve an olympiad problem, I make a deliberate effort to summarize the solution in a few sentences, such that I basically know how to do it from there. I also make a deliberate effort, whenever I write up a solution in my notes, to structure it so that my future self can see all the key ideas at a glance and thus be able to understand the general path of the solution immediately.

The example I’ve previously mentioned is USAMO 2014/6.

Example 1 (USAMO 2014, Gabriel Dospinescu)

Prove that there is a constant ${c>0}$ with the following property: If ${a, b, n}$ are positive integers such that ${\gcd(a+i, b+j)>1}$ for all ${i, j \in \{0, 1, \dots, n\}}$, then

$\displaystyle \min\{a, b\}> (cn)^n.$

If you look at any complete solution to the problem, you will see a lot of technical estimates involving ${\zeta(2)}$ and the like. But the main idea is very simple: “consider an ${N \times N}$ table of primes and note the small primes cannot adequately cover the board, since ${\sum p^{-2} < \frac{1}{2}}$”. Once you have this main idea the technical estimates are just the grunt work that you force yourself to do if you’re a contestant (and don’t do if you’re retired like me).

Thus the study of olympiad problems is reduced to the study of main ideas behind these problems.

## 2. Taxonomy

So how do we come up with the main ideas? Of course I won’t be able to answer this question completely, because therein lies most of the difficulty of olympiads.

But I do have some progress in this way. It comes down to seeing how main ideas are similar to each other. I spend a lot of time trying to classify the main ideas into categories or themes, based on how similar they feel to one another. If I see one theme pop up over and over, then I can make it into a class.

I think olympiad taxonomy is severely underrated, and generally not done correctly. The status quo is that people do bucket sorts based on the particular technical details which are present in the problem. This is correlated with the main ideas, but the two do not always coincide.

An example where technical sort works okay is Euclidean geometry. Here is a simple example: harmonic bundles in projective geometry. As I explain in my book, there are a few “basic” configurations involved:

• Midpoints and parallel lines
• The Ceva / Menelaus configuration
• Harmonic quadrilateral / symmedian configuration
• Apollonian circle (right angle and bisectors)

(For a reference, see Lemmas 2, 4, 5 and Exercise 0 here.) Thus from experience, any time I see one of these pictures inside the current diagram, I think to myself that “this problem feels projective”; and if there is a way to do so I try to use harmonic bundles on it.

An example where technical sort fails is the “pigeonhole principle”. A typical problem in such a class looks something like USAMO 2012/2.

Example 2 (USAMO 2012, Gregory Galperin)

A circle is divided into congruent arcs by ${432}$ points. The points are colored in four colors such that some ${108}$ points are colored Red, some ${108}$ points are colored Green, some ${108}$ points are colored Blue, and the remaining ${108}$ points are colored Yellow. Prove that one can choose three points of each color in such a way that the four triangles formed by the chosen points of the same color are congruent.

It’s true that the official solution uses the words “pigeonhole principle” but that is not really the heart of the matter; the key idea is that you consider all possible rotations and count the number of incidences. (In any case, such calculations are better done using expected value anyways.)

Now why is taxonomy a good thing for learning and teaching? The reason is that building connections and seeing similarities is most easily done by simultaneously presenting several related problems. I’ve actually mentioned this already in a different blog post, but let me give the demonstration again.

Suppose I wrote down the following:

$\displaystyle \begin{array}{lll} A1 & B11 & C8 \\ A9 & B44 & C27 \\ A49 & B33 & C343 \\ A16 & B99 & C1 \\ A25 & B22 & C125 \end{array}$

You can tell what each of the ${A}$‘s, ${B}$‘s, ${C}$‘s have in common by looking for a few moments. But what happens if I intertwine them?

$\displaystyle \begin{array}{lllll} B11 & C27 & C343 & A1 & A9 \\ C125 & B33 & A49 & B44 & A25 \\ A16 & B99 & B22 & C8 & C1 \end{array}$

This is the same information, but now you have to work much harder to notice the association between the letters and the numbers they’re next to.

This is why, if you are an olympiad student, I strongly encourage you to keep a journal or blog of the problems you’ve done. Solving olympiad problems takes lots of time and so it’s worth it to spend at least a few minutes jotting down the main ideas. And once you have enough of these, you can start to see new connections between problems you haven’t seen before, rather than being confined to thinking about individual problems in isolation. (Additionally, it means you will never have redo problems to which you forgot the solution — learn from my mistake here.)

## 3. Ten buckets of geometry

I want to elaborate more on geometry in general. These days, if I see a solution to a Euclidean geometry problem, then I mentally store the problem and solution into one (or more) buckets. I can even tell you what my buckets are:

1. Direct angle chasing
2. Power of a point / radical axis
3. Homothety, similar triangles, ratios
4. Recognizing some standard configuration (see Yufei for a list)
5. Doing some length calculations
6. Complex numbers
7. Barycentric coordinates
8. Inversion
9. Harmonic bundles or pole/polar and homography
10. Spiral similarity, Miquel points

which my dedicated fans probably recognize as the ten chapters of my textbook. (Problems may also fall in more than one bucket if for example they are difficult and require multiple key ideas, or if there are multiple solutions.)

Now whenever I see a new geometry problem, the diagram will often “feel” similar to problems in a certain bucket. Exactly what I mean by “feel” is hard to formalize — it’s a certain gut feeling that you pick up by doing enough examples. There are some things you can say, such as “problems which feature a central circle and feet of altitudes tend to fall in bucket 6”, or “problems which only involve incidence always fall in bucket 9”. But it seems hard to come up with an exhaustive list of hard rules that will do better than human intuition.

## 4. How do problems feel?

But as I said in my post on reading solutions, there are deeper lessons to teach than just technical details.

For examples of themes on opposite ends of the spectrum, let’s move on to combinatorics. Geometry is quite structured and so the themes in the main ideas tend to translate to specific theorems used in the solution. Combinatorics is much less structured and many of the themes I use in combinatorics cannot really be formalized. (Consequently, since everyone else seems to mostly teach technical themes, several of the combinatorics themes I teach are idiosyncratic, and to my knowledge are not taught by anyone else.)

For example, one of the unusual themes I teach is called Global. It’s about the idea that to solve a problem, you can just kind of “add up everything at once”, for example using linearity of expectation, or by double-counting, or whatever. In particular these kinds of approach ignore the “local” details of the problem. It’s hard to make this precise, so I’ll just give two recent examples.

Example 3 (ELMO 2013, Ray Li)

Let ${a_1,a_2,\dots,a_9}$ be nine real numbers, not necessarily distinct, with average ${m}$. Let ${A}$ denote the number of triples ${1 \le i < j < k \le 9}$ for which ${a_i + a_j + a_k \ge 3m}$. What is the minimum possible value of ${A}$?

Example 4 (IMO 2016)

Find all integers ${n}$ for which each cell of ${n \times n}$ table can be filled with one of the letters ${I}$, ${M}$ and ${O}$ in such a way that:

• In each row and column, one third of the entries are ${I}$, one third are ${M}$ and one third are ${O}$; and
• in any diagonal, if the number of entries on the diagonal is a multiple of three, then one third of the entries are ${I}$, one third are ${M}$ and one third are ${O}$.

If you look at the solutions to these problems, they have the same “feeling” of adding everything up, even though the specific techniques are somewhat different (double-counting for the former, diagonals modulo ${3}$ for the latter). Nonetheless, my experience with problems similar to the former was immensely helpful for the latter, and it’s why I was able to solve the IMO problem.

## 5. Gaps

This perspective also explains why I’m relatively bad at functional equations. There are some things I can say that may be useful (see my handouts), but much of the time these are just technical tricks. (When sorting functional equations in my head, I have a bucket called “standard fare” meaning that you “just do work”; as far I can tell this bucket is pretty useless.) I always feel stupid teaching functional equations, because I never have many good insights to say.

Part of the reason is that functional equations often don’t have a main idea at all. Consequently it’s hard for me to do useful taxonomy on them.

Then sometimes you run into something like the windmill problem, the solution of which is fairly “novel”, not being similar to problems that come up in training. I have yet to figure out a good way to train students to be able to solve windmill-like problems.

## 6. Surprise

I’ll close by mentioning one common way I come up with a theme.

Sometimes I will run across an olympiad problem ${P}$ which I solve quickly, and think should be very easy, and yet once I start grading ${P}$ I find that the scores are much lower than I expected. Since the way I solve problems is by drawing experience from similar previous problems, this must mean that I’ve subconsciously found a general framework to solve problems like ${P}$, which is not obvious to my students yet. So if I can put my finger on what that framework is, then I have something new to say.

The most recent example I can think of when this happened was TSTST 2016/4 which was given last June (and was also a very elegant problem, at least in my opinion).

Example 5 (TSTST 2016, Linus Hamilton)

Let ${n > 1}$ be a positive integers. Prove that we must apply the Euler ${\varphi}$ function at least ${\log_3 n}$ times before reaching ${1}$.

I solved this problem very quickly when we were drafting the TSTST exam, figuring out the solution while walking to dinner. So I was quite surprised when I looked at the scores for the problem and found out that empirically it was not that easy.

After I thought about this, I have a new tentative idea. You see, when doing this problem I really was thinking about “what does this ${\varphi}$ operation do?”. You can think of ${n}$ as an infinite tuple

$\displaystyle \left(\nu_2(n), \nu_3(n), \nu_5(n), \nu_7(n), \dots \right)$

of prime exponents. Then the ${\varphi}$ can be thought of as an operation which takes each nonzero component, decreases it by one, and then adds some particular vector back. For example, if ${\nu_7(n) > 0}$ then ${\nu_7}$ is decreased by one and each of ${\nu_2(n)}$ and ${\nu_3(n)}$ are increased by one. In any case, if you look at this behavior for long enough you will see that the ${\nu_2}$ coordinate is a natural way to “track time” in successive ${\varphi}$ operations; once you figure this out, getting the bound of ${\log_3 n}$ is quite natural. (Details left as exercise to reader.)

Now when I read through the solutions, I found that many of them had not really tried to think of the problem in such a “structured” way, and had tried to directly solve it by for example trying to prove ${\varphi(n) \ge n/3}$ (which is false) or something similar to this. I realized that had the students just ignored the task “prove ${n \le 3^k}$” and spent some time getting a better understanding of the ${\varphi}$ structure, they would have had a much better chance at solving the problem. Why had I known that structural thinking would be helpful? I couldn’t quite explain it, but it had something to do with the fact that the “main object” of the question was “set in stone”; there was no “degrees of freedom” in it, and it was concrete enough that I felt like I could understand it. Once I understood how multiple ${\varphi}$ operations behaved, the bit about ${\log_3 n}$ almost served as an “answer extraction” mechanism.

These thoughts led to the recent development of a class which I named Rigid, which is all about problems where the point is not to immediately try to prove what the question asks for, but to first step back and understand completely how a particular rigid structure (like the ${\varphi}$ in this problem) behaves, and to then solve the problem using this understanding.

(Ed Note: This was earlier posted under the incorrect title “On Designing Olympiad Training”. How I managed to mess that up is a long story involving some incompetence with Python scripts, but this is fixed now.)

Spoiler warnings: USAMO 2014/1, and hints for Putnam 2014 A4 and B2. You may want to work on these problems yourself before reading this post.

## 1. An Apology

At last year’s USA IMO training camp, I prepared a handout on writing/style for the students at MOP. One of the things I talked about was the “ocean-crossing point”, which for our purposes you can think of as the discrete jump from a problem being “essentially not solved” (${0+}$) to “essentially solved” (${7-}$). The name comes from a Scott Aaronson post:

Suppose your friend in Boston blindfolded you, drove you around for twenty minutes, then took the blindfold off and claimed you were now in Beijing. Yes, you do see Chinese signs and pagoda roofs, and no, you can’t immediately disprove him — but based on your knowledge of both cars and geography, isn’t it more likely you’re just in Chinatown? . . . We start in Boston, we end up in Beijing, and at no point is anything resembling an ocean ever crossed.

I then gave two examples of how to write a solution to the following example problem.

Problem 1 (USAMO 2014)

Let ${a}$, ${b}$, ${c}$, ${d}$ be real numbers such that ${b-d \ge 5}$ and all zeros ${x_1}$, ${x_2}$, ${x_3}$, and ${x_4}$ of the polynomial ${P(x)=x^4+ax^3+bx^2+cx+d}$ are real. Find the smallest value the product

$\displaystyle (x_1^2+1)(x_2^2+1)(x_3^2+1)(x_4^2+1)$

can take.

Proof: (Not-so-good write-up) Since ${x_j^2+1 = (x+i)(x-i)}$ for every ${j=1,2,3,4}$ (where ${i=\sqrt{-1}}$), we get ${\prod_{j=1}^4 (x_j^2+1) = \prod_{j=1}^4 (x_j+i)(x_j-i) = P(i)P(-i)}$ which equals to ${|P(i)|^2 = (b-d-1)^2 + (a-c)^2}$. If ${x_1 = x_2 = x_3 = x_4 = 1}$ this is ${16}$ and ${b-d = 5}$. Also, ${b-d \ge 5}$, this is ${\ge 16}$. $\Box$

Proof: (Better write-up) The answer is ${16}$. This can be achieved by taking ${x_1 = x_2 = x_3 = x_4 = 1}$, whence the product is ${2^4 = 16}$, and ${b-d = 5}$.

Now, we prove this is a lower bound. Let ${i = \sqrt{-1}}$. The key observation is that

$\displaystyle \prod_{j=1}^4 \left( x_j^2 + 1 \right) = \prod_{j=1}^4 (x_j - i)(x_j + i) = P(i)P(-i).$

Consequently, we have

\displaystyle \begin{aligned} \left( x_1^2 + 1 \right) \left( x_2^2 + 1 \right) \left( x_3^2 + 1 \right) \left( x_1^2 + 1 \right) &= (b-d-1)^2 + (a-c)^2 \\ &\ge (5-1)^2 + 0^2 = 16. \end{aligned}

This proves the lower bound. $\Box$

You’ll notice that it’s much easier to see the key idea in the second solution: namely,

$\displaystyle \prod_j (x_j^2+1) = P(i)P(-i) = (b-d-1)^2 + (a-c)^2$

which allows you use the enigmatic condition ${b-d \ge 5}$.

Unfortunately I have the following confession to make:

In practice, most solutions are written more like the first one than the second one.

The truth is that writing up solutions is sort of a chore that people never really want to do but have to — much like washing dishes. So must solutions won’t be written in a way that helps you learn from them. This means that when you read solutions, you should assume that the thing you really want (i.e., the ocean-crossing point) is buried somewhere amidst a haystack of other unimportant details.

## 2. Diff

But in practice even the “better write-up” I mentioned above still has too much information in it.

Suppose you were explaining how to solve this problem to a friend. You would probably not start your explanation by saying that the minimum is ${16}$, achieved by ${x_1 = x_2 = x_3 = x_4 = 1}$ — even though this is indeed a logically necessary part of the solution. Instead, the first thing you would probably tell them is to notice that

$\displaystyle \prod_{j=1}^4 \left( x_j^2 + 1 \right) = P(i)P(-i) = (b-d-1)^2 + (a-c)^2 \ge 4^2 = 16.$

In fact, if your friend has been working on the problem for more than ten minutes, this is probably the only thing you need to tell them. They probably already figured out by themselves that there was a good chance the answer would be ${2^4 = 16}$, just based on the condition ${b-d \ge 5}$. This “one-liner” is all that they need to finish the problem. You don’t need to spell out to them the rest of the details.

When you explain a problem to a friend in this way, you’re communicating just the difference: the one or two sentences such that your friend could work out the rest of the details themselves with these directions. When reading the solution yourself, you should try to extract the main idea in the same way. Olympiad problems generally have only a few main ideas in them, from which the rest of the details can be derived. So reading the solution should feel much like searching for a needle in a haystack.

## 3. Don’t Read Line by Line

In particular: you should rarely read most of the words in the solution, and you should almost never read every word of the solution.

Whenever I read solutions to problems I didn’t solve, I often read less than 10% of the words in the solution. Instead I search aggressively for the one or two sentences which tell me the key step that I couldn’t find myself. (Functional equations are the glaring exception to this rule, since in these problems there sometimes isn’t any main idea other than “stumble around randomly”, and the steps really are all about equally important. But this is rarer than you might guess.)

I think a common mistake students make is to treat the solution as a sequence of logical steps: that is, reading the solution line by line, and then verifying that each line follows from the previous ones. This seems to entirely miss the point, because not all lines are created equal, and most lines can be easily derived once you figure out the main idea.

If you find that the only way that you can understand the solution is reading it step by step, then the problem may simply be too hard for you. This is because what counts as “details” and “main ideas” are relative to the absolute difficulty of the problem. Here’s an example of what I mean: the solution to a USAMO 3/6 level geometry problem, call it ${P}$, might look as follows.

Proof: First, we prove lemma ${L_1}$. (Proof of ${L_1}$, which is USAMO 1/4 level.)

Then, we prove lemma ${L_2}$. (Proof of ${L_2}$, which is USAMO 1/4 level.)

Finally, we remark that putting together ${L_1}$ and ${L_2}$ solves the problem. $\Box$

Likely the main difficulty of ${P}$ is actually finding ${L_1}$ and ${L_2}$. So a very experienced student might think of the sub-proofs ${L_i}$ as “easy details”. But younger students might find ${L_i}$ challenging in their own right, and be unable to solve the problem even after being told what the lemmas are: which is why it is hard for them to tell that ${\{L_1, L_2\}}$ were the main ideas to begin with. In that case, the problem ${P}$ is probably way over their head.

This is also why it doesn’t make sense to read solutions to problems which you have not worked on at all — there are often details, natural steps and notation, et cetera which are obvious to you if and only if you have actually tried the problem for a little while yourself.

## 4. Reflection

The earlier sections describe how to extract the main idea of an olympiad solution. This is neat because instead of having to remember an entire solution, you only need to remember a few sentences now, and it gives you a good understanding of the solution at hand.

But this still isn’t achieving your ultimate goal in learning: you are trying to maximize your scores on future problems. Unless you are extremely fortunate, you will probably never see the exact same problem on an exam again.

So one question you should often ask is:

“How could I have thought of that?”

(Or in my case, “how could I train a student to think of this?”.)

There are probably some surface-level skills that you can pick out of this. The lowest hanging fruit is things that are technical. A small number of examples, with varying amounts of depth:

• This problem is “purely projective”, so we can take a projective transformation!
• This problem had a segment ${AB}$ with midpoint ${M}$, and a line ${\ell}$ parallel to ${AB}$, so I should consider projecting ${(AB;M\infty)}$ through a point on ${\ell}$.
• Drawing a grid of primes is the only real idea in this problem, and the rest of it is just calculations.
• This main claim is easy to guess since in some small cases, the frogs have “violating points” in a large circle.
• In this problem there are ${n}$ numbers on a circle, ${n}$ odd. The counterexamples for ${n}$ even alternate up and down, which motivates proving that no three consecutive numbers are in sorted order.
• This is a juggling problem!

(Brownie points if any contest enthusiasts can figure out which problems I’m talking about in this list!)

## 5. Learn Philosophy, not Formalism

But now I want to point out that the best answers to the above question are often not formalizable. Lists of triggers and actions are “cheap forms of understanding”, because going through a list of methods will only get so far.

On the other hand, the un-formalizable philosophy that you can extract from reading a question, is part of that legendary “intuition” that people are always talking about: you can’t describe it in words, but it’s certainly there. Maybe I would even be better if I reframed the question as:

“What does this problem feel like?”

So let’s talk about our feelings. Here is David Yang’s take on it:

Whenever you see a problem you really like, store it (and the solution) in your mind like a cherished memory . . . The point of this is that you will see problems which will remind you of that problem despite having no obvious relation. You will not be able to say concretely what the relation is, but think a lot about it and give a name to the common aspect of the two problems. Eventually, you will see new problems for which you feel like could also be described by that name.

Do this enough, and you will have a very powerful intuition that cannot be described easily concretely (and in particular, that nobody else will have).

This itself doesn’t make sense without an example, so here is an example of one philosophy I’ve developed. Here are two problems on Putnam 2014:

Problem 2 (Putnam 2014 A4)

Suppose ${X}$ is a random variable that takes on only nonnegative integer values, with ${\mathbb E[X] = 1}$, ${\mathbb E[X^2] = 2}$, and ${\mathbb E[X^3] = 5}$. Determine the smallest possible value of the probability of the event ${X=0}$.

Problem 3 (Putnam 2014 B2)

Suppose that ${f}$ is a function on the interval ${[1,3]}$ such that ${-1\le f(x)\le 1}$ for all ${x}$ and

$\displaystyle \int_1^3 f(x) \; dx=0.$

How large can ${\int_1^3 \frac{f(x)}{x} \; dx}$ be?

At a glance there seems to be nearly no connection between these problems. One of them is a combinatorics/algebra question, and the other is an integral. Moreover, if you read the official solutions or even my own write-ups, you will find very little in common joining them.

Yet it turns out that these two problems do have something in common to me, which I’ll try to describe below. My thought process in solving either question went as follows:

In both problems, I was able to quickly make a good guess as to what the optimal ${X}$/${f}$ was, and then come up with a heuristic explanation (not a proof) why that guess had to be correct, namely, “by smoothing, you should put all the weight on the left”. Let me call this optimal argument ${A}$.

That conjectured ${A}$ gave a numerical answer to the actual problem: but for both of these problems, it turns out that numerical answer is completely uninteresting, as are the exact details of ${A}$. It should be philosophically be interpreted as “this is the number that happens to pop out when you plug in the optimal choice”. And indeed that’s what both solutions feel like. These solutions don’t actually care what the exact values of ${A}$ are, they only care about the properties that made me think they were optimal in the first place.

I gave this philosophy the name Equality, with poster description “problems where looking at the equality case is important”. This text description feels more or less useless to me; I suppose it’s the thought that counts. But ever since I came up with this name, it has helped me solve new problems that come up, because they would give me the same feeling that these two problems did.

Two more examples of these themes that I’ve come up with are Global and Rigid, which will be described in a future post on how I design training materials.

# Holomorphic Logarithms and Roots

In this post we’ll make sense of a holomorphic square root and logarithm. Wrote this up because I was surprised how hard it was to find a decent complete explanation.

Let ${f : U \rightarrow \mathbb C}$ be a holomorphic function. A holomorphic ${n}$th root of ${f}$ is a function ${g : U \rightarrow \mathbb C}$ such that ${f(z) = g(z)^n}$ for all ${z \in U}$. A logarithm of ${f}$ is a function ${g : U \rightarrow \mathbb C}$ such that ${f(z) = e^{g(z)}}$ for all ${z \in U}$. The main question we’ll try to figure out is: when do these exist? In particular, what if ${f = \mathrm{id}}$?

## 1. Motivation: Square Root of a Complex Number

To start us off, can we define ${\sqrt z}$ for any complex number ${z}$?

The first obvious problem that comes up is that there for any ${z}$, there are two numbers ${w}$ such that ${w^2 = z}$. How can we pick one to use? For our ordinary square root function, we had a notion of “positive”, and so we simply took the positive root.

Let’s expand on this: given ${ z = r \left( \cos\theta + i \sin\theta \right) }$ (here ${r \ge 0}$) we should take the root to be

$\displaystyle w = \sqrt{r} \left( \cos \alpha + i \sin \alpha \right).$

such that ${2\alpha \equiv \theta \pmod{2\pi}}$; there are two choices for ${\alpha \pmod{2\pi}}$, differing by ${\pi}$.

For complex numbers, we don’t have an obvious way to pick ${\alpha}$. Nonetheless, perhaps we can also get away with an arbitrary distinction: let’s see what happens if we just choose the ${\alpha}$ with ${-\frac{1}{2}\pi < \alpha \le \frac{1}{2}\pi}$.

Pictured below are some points (in red) and their images (in blue) under this “upper-half” square root. The condition on ${\alpha}$ means we are forcing the blue points to lie on the right-half plane.

Here, ${w_i^2 = z_i}$ for each ${i}$, and we are constraining the ${w_i}$ to lie in the right half of the complex plane. We see there is an obvious issue: there is a big discontinuity near the point ${z_5}$ and ${z_7}$! The nearby point ${w_6}$ has been mapped very far away. This discontinuity occurs since the points on the negative real axis are at the “boundary”. For example, given ${-4}$, we send it to ${-2i}$, but we have hit the boundary: in our interval ${-\frac{1}{2}\pi \le \alpha < \frac{1}{2}\pi}$, we are at the very left edge.

The negative real axis that we must not touch is is what we will later call a branch cut, but for now I call it a ray of death. It is a warning to the red points: if you cross this line, you will die! However, if we move the red circle just a little upwards (so that it misses the negative real axis) this issue is avoided entirely, and we get what seems to be a “nice” square root.

In fact, the ray of death is fairly arbitrary: it is the set of “boundary issues” that arose when we picked ${-\frac{1}{2}\pi < \alpha \le \frac{1}{2}\pi}$. Suppose we instead insisted on the interval ${0 \le \alpha < \pi}$; then the ray of death would be the positive real axis instead. The earlier circle we had now works just fine.

What we see is that picking a particular ${\alpha}$-interval leads to a different set of edge cases, and hence a different ray of death. The only thing these rays have in common is their starting point of zero. In other words, given a red circle and a restriction of ${\alpha}$, I can make a nice “square rooted” blue circle as long as the ray of death misses it.

So, what exactly is going on?

## 2. Square Roots of Holomorphic Functions

To get a picture of what’s happening, we would like to consider a more general problem: let ${f: U \rightarrow \mathbb C}$ be holomorphic. Then we want to decide whether there is a ${g : U \rightarrow \mathbb C}$ such that

$\displaystyle f(z) = g(z)^2.$

Our previous discussion when ${f = \mathrm{id}}$ tells us we cannot hope to achieve this for ${U = \mathbb C}$; there is a “half-ray” which causes problems. However, there are certainly functions ${f : \mathbb C \rightarrow \mathbb C}$ such that a ${g}$ exists. As a simplest example, ${f(z) = z^2}$ should definitely have a square root!

Now let’s see if we can fudge together a square root. Earlier, what we did was try to specify a rule to force one of the two choices at each point. This is unnecessarily strict. Perhaps we can do something like the following: start at a point in ${z_0 \in U}$, pick a square root ${w_0}$ of ${f(z_0)}$, and then try to “fudge” from there the square roots of the other points. What do I mean by fudge? Well, suppose ${z_1}$ is a point very close to ${z_0}$, and we want to pick a square root ${w_1}$ of ${f(z_1)}$. While there are two choices, we also would expect ${w_0}$ to be close to ${w_1}$. Unless we are highly unlucky, this should tells us which choice of ${w_1}$ to pick. (Stupid concrete example: if I have taken the square root ${-4.12i}$ of ${-17}$ and then ask you to continue this square root to ${-16}$, which sign should you pick for ${\pm 4i}$?)

There are two possible ways we could get unlucky in the scheme above: first, if ${w_0 = 0}$, then we’re sunk. But even if we avoid that, we have to worry that we are in a situation, where we run around a full loop in the complex plane, and then find that our continuous perturbation has left us in a different place than we started. For concreteness, consider the following situation, again with ${f = \mathrm{id}}$:

We started at the point ${z_0}$, with one of its square roots as ${w_0}$. We then wound a full red circle around the origin, only to find that at the end of it, the blue arc is at a different place where it started!

The interval construction from earlier doesn’t work either: no matter how we pick the interval for ${\alpha}$, any ray of death must hit our red circle. The problem somehow lies with the fact that we have enclosed the very special point ${0}$.

Nevertheless, we know that if we take ${f(z) = z^2}$, then we don’t run into any problems with our “make it up as you go” procedure. So, what exactly is going on?

## 3. Covering Projections

By now, if you have read the part of algebraic topology. this should all seem very strangely familiar. The “fudging” procedure exactly describes the idea of a lifting.

More precisely, recall that there is a covering projection

$\displaystyle (-)^2 : \mathbb C \setminus \{0\} \rightarrow \mathbb C \setminus \{0\}.$

Let ${V = \left\{ z \in U \mid f(z) \neq 0 \right\}}$. For ${z \in U \setminus V}$, we already have the square root ${g(z) = \sqrt{f(z)} = \sqrt 0 = 0}$. So the burden is completing ${g : V \rightarrow \mathbb C}$.

Then essentially, what we are trying to do is construct a lifting ${g}$ for the following diagram: Our map ${p}$ can be described as “winding around twice”. From algebraic topology, we now know that this lifting exists if and only if

$\displaystyle f_\ast(\pi_1(V)) \subseteq p_\ast(\pi_1(E))$

is a subset of the image of ${\pi_1(E)}$ by ${p}$. Since ${B}$ and ${E}$ are both punctured planes, we can identify them with ${S^1}$.

Ques 1

Show that the image under ${p}$ is exactly ${2\mathbb Z}$ once we identify ${\pi_1(B) = \mathbb Z}$.

That means that for any loop ${\gamma}$ in ${V}$, we need ${f \circ \gamma}$ to have an even winding number around ${0 \in B}$. This amounts to

$\displaystyle \frac{1}{2\pi} \oint_\gamma \frac{f'}{f} \; dz \in 2\mathbb Z$

since ${f}$ has no poles.

Replacing ${2}$ with ${n}$ and carrying over the discussion gives the first main result.

Theorem 2 (Existence of Holomorphic ${n}$th Roots)

Let ${f : U \rightarrow \mathbb C}$ be holomorphic. Then ${f}$ has a holomorphic ${n}$th root if and only if

$\displaystyle \frac{1}{2\pi i}\oint_\gamma \frac{f'}{f} \; dz \in n\mathbb Z$

for every contour ${\gamma}$ in ${U}$.

## 4. Complex Logarithms

The multivalued nature of the complex logarithm comes from the fact that

$\displaystyle \exp(z+2\pi i) = \exp(z).$

So if ${e^w = z}$, then any complex number ${w + 2\pi i k}$ is also a solution.

We can handle this in the same way as before: it amounts to a lifting of the following diagram. There is no longer a need to work with a separate ${V}$ since:

Ques 3

Show that if ${f}$ has any zeros then ${g}$ possibly can’t exist.

In fact, the map ${\exp : \mathbb C \rightarrow \mathbb C\setminus\{0\}}$ is a universal cover, since ${\mathbb C}$ is simply connected. Thus, ${p(\pi_1(\mathbb C))}$ is trivial. So in addition to being zero-free, ${f}$ cannot have any winding number around ${0 \in B}$ at all. In other words:

Theorem 4 (Existence of Logarithms)

Let ${f : U \rightarrow \mathbb C}$ be holomorphic. Then ${f}$ has a logarithm if and only if

$\displaystyle \frac{1}{2\pi i}\oint_\gamma \frac{f'}{f} \; dz = 0$

for every contour ${\gamma}$ in ${U}$.

## 5. Some Special Cases

The most common special case is

Corollary 5 (Nonvanishing Functions from Simply Connected Domains)

Let ${f : \Omega \rightarrow \mathbb C}$ be continuous, where ${\Omega}$ is simply connected. If ${f(z) \neq 0}$ for every ${z \in \Omega}$, then ${f}$ has both a logarithm and holomorphic ${n}$th root.

Finally, let’s return to the question of ${f = \mathrm{id}}$ from the very beginning. What’s the best domain ${U}$ such that we can define ${\sqrt{-} : U \rightarrow \mathbb C}$? Clearly ${U = \mathbb C}$ cannot be made to work, but we can do almost as well. For note that the only zero of ${f = \mathrm{id}}$ is at the origin. Thus if we want to make a logarithm exist, all we have to do is make an incision in the complex plane that renders it impossible to make a loop around the origin. The usual choice is to delete negative half of the real axis, our very first ray of death; we call this a branch cut, with branch point at ${0 \in \mathbb C}$ (the point which we cannot circle around). This gives

Theorem 6 (Branch Cut Functions)

There exist holomorphic functions

\displaystyle \begin{aligned} \log &: \mathbb C \setminus (-\infty, 0] \rightarrow \mathbb C \\ \sqrt[n]{-} &: \mathbb C \setminus (-\infty, 0] \rightarrow \mathbb C \end{aligned}

satisfying the obvious properties.

There are many possible choices of such functions (${n}$ choices for the ${n}$th root and infinitely many for ${\log}$); a choice of such a function is called a branch. So this is what is meant by a “branch” of a logarithm.

The principal branch is the “canonical” branch, analogous to the way we arbitrarily pick the positive branch to define ${\sqrt{-} : \mathbb R_{\ge 0} \rightarrow \mathbb R_{\ge 0}}$. For ${\log}$, we take the ${w}$ such that ${e^w = z}$ and the imaginary part of ${w}$ lies in ${(-\pi, \pi]}$ (since we can shift by integer multiples of ${2\pi i}$). Often, authors will write ${\text{Log } z}$ to emphasize this choice.

Example 7

Let ${U}$ be the complex plane minus the real interval ${[0,1]}$. Then the function ${U \rightarrow \mathbb C}$ by ${z \mapsto z(z-1)}$ has a holomorphic square root.

Corollary 8

A holomorphic function ${f : U \rightarrow \mathbb C}$ has a holomorphic ${n}$th root for all ${n \ge 1}$ if and only if it has a holomorphic logarithm.

# Facts about Lie Groups and Algebras

In Spring 2016 I was taking 18.757 Representations of Lie Algebras. Since I knew next to nothing about either Lie groups or algebras, I was forced to quickly learn about their basic facts and properties. These are the notes that I wrote up accordingly. Proofs of most of these facts can be found in standard textbooks, for example Kirillov.

## 1. Lie groups

Let ${K = \mathbb R}$ or ${K = \mathbb C}$, depending on taste.

Definition 1

A Lie group is a group ${G}$ which is also a ${K}$-manifold; the multiplication maps ${G \times G \rightarrow G}$ (by ${(g_1, g_2) \mapsto g_1g_2}$) and the inversion map ${G \rightarrow G}$ (by ${g \mapsto g^{-1}}$) are required to be smooth.

A morphism of Lie groups is a map which is both a map of manifolds and a group homomorphism.

Throughout, we will let ${e \in G}$ denote the identity, or ${e_G}$ if we need further emphasis.

Note that in particular, every group ${G}$ can be made into a Lie group by endowing it with the discrete topology. This is silly, so we usually require only focus on connected groups:

Proposition 2 (Reduction to connected Lie groups)

Let ${G}$ be a Lie group and ${G^0}$ the connected component of ${G}$ which contains ${e}$. Then ${G^0}$ is a normal subgroup, itself a Lie group, and the quotient ${G/G^0}$ has the discrete topology.

In fact, we can also reduce this to the study of simply connected Lie groups as follows.

Proposition 3 (Reduction to simply connected Lie groups)

If ${G}$ is connected, let ${\pi : \widetilde G \rightarrow G}$ be its universal cover. Then ${\widetilde G}$ is a Lie group, ${\pi}$ is a morphism of Lie groups, and ${\ker \pi \cong \pi_1(G)}$.

Here are some examples of Lie groups.

Example 4 (Examples of Lie groups)

• ${\mathbb R}$ under addition is a real one-dimensional Lie group.
• ${\mathbb C}$ under addition is a complex one-dimensional Lie group (and a two-dimensional real Lie group)!
• The unit circle ${S^1 \subseteq \mathbb C}$ is a real Lie group under multiplication.
• ${\text{GL }(n, K) \subset K^{\oplus n^2}}$ is a Lie group of dimension ${n^2}$. This example becomes important for representation theory: a representation of a Lie group ${G}$ is a morphism of Lie groups ${G \rightarrow \text{GL }(n, K)}$.
• ${\text{SL }(n, K) \subset \text{GL }(n, K)}$ is a Lie group of dimension ${n^2-1}$.

As geometric objects, Lie groups ${G}$ enjoy a huge amount of symmetry. For example, any neighborhood ${U}$ of ${e}$ can be “copied over” to any other point ${g \in G}$ by the natural map ${gU}$. There is another theorem worth noting, which is that:

Proposition 5

If ${G}$ is a connected Lie group and ${U}$ is a neighborhood of the identity ${e \in G}$, then ${U}$ generates ${G}$ as a group.

## 2. Haar measure

Recall the following result and its proof from representation theory:

Claim 6

For any finite group ${G}$, ${\mathbb C[G]}$ is semisimple; all finite-dimensional representations decompose into irreducibles.

Proof: Take a representation ${V}$ and equip it with an arbitrary inner form ${\left< -,-\right>_0}$. Then we can average it to obtain a new inner form

$\displaystyle \left< v, w \right> = \frac{1}{|G|} \sum_{g \in G} \left< gv, gw \right>_0.$

which is ${G}$-invariant. Thus given a subrepresentation ${W \subseteq V}$ we can just take its orthogonal complement to decompose ${V}$. $\Box$
We would like to repeat this type of proof with Lie groups. In this case the notion ${\sum_{g \in G}}$ doesn’t make sense, so we want to replace it with an integral ${\int_{g \in G}}$ instead. In order to do this we use the following:

Theorem 7 (Haar measure)

Let ${G}$ be a Lie group. Then there exists a unique Radon measure ${\mu}$ (up to scaling) on ${G}$ which is left-invariant, meaning

$\displaystyle \mu(g \cdot S) = \mu(S)$

for any Borel subset ${S \subseteq G}$ and “translate” ${g \in G}$. This measure is called the (left) Haar measure.

Example 8 (Examples of Haar measures)

• The Haar measure on ${(\mathbb R, +)}$ is the standard Lebesgue measure which assigns ${1}$ to the closed interval ${[0,1]}$. Of course for any ${S}$, ${\mu(a+S) = \mu(S)}$ for ${a \in \mathbb R}$.
• The Haar measure on ${(\mathbb R \setminus \{0\}, \times)}$ is given by

$\displaystyle \mu(S) = \int_S \frac{1}{|t|} \; dt.$

In particular, ${\mu([a,b]) = \log(b/a)}$. One sees the invariance under multiplication of these intervals.

• Let ${G = \text{GL }(n, \mathbb R)}$. Then a Haar measure is given by

$\displaystyle \mu(S) = \int_S |\det(X)|^{-n} \; dX.$

• For the circle group ${S^1}$, consider ${S \subseteq S^1}$. We can define

$\displaystyle \mu(S) = \frac{1}{2\pi} \int_S d\varphi$

across complex arguments ${\varphi}$. The normalization factor of ${2\pi}$ ensures ${\mu(S^1) = 1}$.

Note that we have:

Corollary 9

If the Lie group ${G}$ is compact, there is a unique Haar measure with ${\mu(G) = 1}$.

This follows by just noting that if ${\mu}$ is Radon measure on ${X}$, then ${\mu(X) < \infty}$. This now lets us deduce that

Corollary 10 (Compact Lie groups are semisimple)

${\mathbb C[G]}$ is semisimple for any compact Lie group ${G}$.

Indeed, we can now consider

$\displaystyle \left< v,w\right> = \int_G \left< g \cdot v, g \cdot w\right>_0 \; dg$

as we described at the beginning.

## 3. The tangent space at the identity

In light of the previous comment about neighborhoods of ${e}$ generating ${G}$, we see that to get some information about the entire Lie group it actually suffices to just get “local” information of ${G}$ at the point ${e}$ (this is one formalization of the fact that Lie groups are super symmetric).

To do this one idea is to look at the tangent space. Let ${G}$ be an ${n}$-dimensional Lie group (over ${K}$) and consider ${\mathfrak g = T_eG}$ the tangent space to ${G}$ at the identity ${e \in G}$. Naturally, this is a ${K}$-vector space of dimension ${n}$. We call it the Lie algebra associated to ${G}$.

Example 11 (Lie algebras corresponding to Lie groups)

• ${(\mathbb R, +)}$ has a real Lie algebra isomorphic to ${\mathbb R}$.
• ${(\mathbb C, +)}$ has a complex Lie algebra isomorphic to ${\mathbb C}$.
• The unit circle ${S^1 \subseteq \mathbb C}$ has a real Lie algebra isomorphic to ${\mathbb R}$, which we think of as the “tangent line” at the point ${1 \in S^1}$.

Example 12 (${\mathfrak{gl}(n, K)}$)

Let’s consider ${\text{GL }(n, K) \subset K^{\oplus n^2}}$, an open subset of ${K^{\oplus n^2}}$. Its tangent space should just be an ${n^2}$-dimensional ${K}$-vector space. By identifying the components in the obvious way, we can think of this Lie algebra as just the set of all ${n \times n}$ matrices.

This Lie algebra goes by the notation ${\mathfrak{gl}(n, K)}$.

Example 13 (${\mathfrak{sl}(n, K)}$)

Recall ${\text{SL }(n, K) \subset \text{GL }(n, K)}$ is a Lie group of dimension ${n^2-1}$, hence its Lie algebra should have dimension ${n^2-1}$. To see what it is, let’s look at the special case ${n=2}$ first: then

$\displaystyle \text{SL }(2, K) = \left\{ \begin{pmatrix} a & b \\ c & d \end{pmatrix} \mid ad - bc = 1 \right\}.$

Viewing this as a polynomial surface ${f(a,b,c,d) = ad-bc}$ in ${K^{\oplus 4}}$, we compute

$\displaystyle \nabla f = \left< d, -c, -b, a \right>$

and in particular the tangent space to the identity matrix ${\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}}$ is given by the orthogonal complement of the gradient

$\displaystyle \nabla f (1,0,0,1) = \left< 1, 0, 0, 1 \right>.$

Hence the tangent plane can be identified with matrices satisfying ${a+d=0}$. In other words, we see

$\displaystyle \mathfrak{sl}(2, K) = \left\{ T \in \mathfrak{gl}(2, K) \mid \text{Tr } T = 0. \right\}.$

By repeating this example in greater generality, we discover

$\displaystyle \mathfrak{sl}(n, K) = \left\{ T \in \mathfrak{gl}(n, K) \mid \text{Tr } T = 0. \right\}.$

## 4. The exponential map

Right now, ${\mathfrak g}$ is just a vector space. However, by using the group structure we can get a map from ${\mathfrak g}$ back into ${G}$. The trick is “differential equations”:

Proposition 14 (Differential equations for Lie theorists)

Let ${G}$ be a Lie group over ${K}$ and ${\mathfrak g}$ its Lie algebra. Then for every ${x \in \mathfrak g}$ there is a unique homomorphism

$\displaystyle \gamma_x : K \rightarrow G$

which is a morphism of Lie groups, such that

$\displaystyle \gamma_x'(0) = x \in T_eG = \mathfrak g.$

We will write ${\gamma_x(t)}$ to emphasize the argument ${t \in K}$ being thought of as “time”. Thus this proposition should be intuitively clear: the theory of differential equations guarantees that ${\gamma_x}$ is defined and unique in a small neighborhood of ${0 \in K}$. Then, the group structure allows us to extend ${\gamma_x}$ uniquely to the rest of ${K}$, giving a trajectory across all of ${G}$. This is sometimes called a one-parameter subgroup of ${G}$, but we won’t use this terminology anywhere in what follows.

This lets us define:

Definition 15

Retain the setting of the previous proposition. Then the exponential map is defined by

$\displaystyle \exp : \mathfrak g \rightarrow G \qquad\text{by}\qquad x \mapsto \gamma_x(1).$

The exponential map gets its name from the fact that for all the examples I discussed before, it is actually just the map ${e^\bullet}$. Note that below, ${e^T = \sum_{k \ge 0} \frac{T^k}{k!}}$ for a matrix ${T}$; this is called the matrix exponential.

Example 16 (Exponential Maps of Lie algebras)

• If ${G = \mathbb R}$, then ${\mathfrak g = \mathbb R}$ too. We observe ${\gamma_x(t) = e^{tx} \in \mathbb R}$ (where ${t \in \mathbb R}$) is a morphism of Lie groups ${\gamma_x : \mathbb R \rightarrow G}$. Hence

$\displaystyle \exp : \mathbb R \rightarrow \underbrace{\mathbb R}_{=G} \qquad \exp(x) = \gamma_x(1) = e^t \in \mathbb R = G.$

• Ditto for ${\mathbb C}$.
• For ${S^1}$ and ${x \in \mathbb R}$, the map ${\gamma_x : \mathbb R \rightarrow S^1}$ given by ${t \mapsto e^{itx}}$ works. Hence

$\displaystyle \exp : \mathbb R \rightarrow S^1 \qquad \exp(x) = \gamma_x(1) = e^{it} \in S^1.$

• For ${\text{GL }(n, K)}$, the map ${\gamma_X : K \rightarrow \text{GL }(n, K)}$ given by ${t \mapsto e^{tX}}$ works nicely (now ${X}$ is a matrix). (Note that we have to check ${e^{tX}}$ is actually invertible for this map to be well-defined.) Hence the exponential map is given by

$\displaystyle \exp : \mathfrak{gl}(n,K) \rightarrow \text{GL }(n,K) \qquad \exp(X) = \gamma_X(1) = e^X \in \text{GL }(n, K).$

• Similarly,

$\displaystyle \exp : \mathfrak{sl}(n,K) \rightarrow \text{SL }(n,K) \qquad \exp(X) = \gamma_X(1) = e^X \in \text{SL }(n, K).$

Here we had to check that if ${X \in \mathfrak{sl}(n,K)}$, meaning ${\text{Tr } X = 0}$, then ${\det(e^X) = 1}$. This can be seen by writing ${X}$ in an upper triangular basis.

Actually, taking the tangent space at the identity is a functor. Consider a map ${\varphi : G_1 \rightarrow G_2}$ of Lie groups, with lie algebras ${\mathfrak g_1}$ and ${\mathfrak g_2}$. Because ${\varphi}$ is a group homomorphism, ${G_1 \ni e_1 \mapsto e_2 \in G_2}$. Now, by manifold theory we know that maps ${f : M \rightarrow N}$ between manifolds gives a linear map between the corresponding tangent spaces, say ${Tf : T_pM \rightarrow T_{fp}N}$. For us we obtain a linear map

$\displaystyle \varphi_\ast = T \varphi : \mathfrak g_1 \rightarrow \mathfrak g_2.$

In fact, this ${\varphi_\ast}$ fits into a diagram

Here are a few more properties of ${\exp}$:

• ${\exp(0) = e \in G}$, which is immediate by looking at the constant trajectory ${\phi_0(t) \equiv e}$.
• ${\exp'(x) = x \in \mathfrak g}$, i.e. the total derivative ${D\exp : \mathfrak g \rightarrow \mathfrak g}$ is the identity. This is again by construction.
• In particular, by the inverse function theorem this implies that ${\exp}$ is a diffeomorphism in a neighborhood of ${0 \in \mathfrak g}$, onto a neighborhood of ${e \in G}$.
• ${\exp}$ commutes with the commutator. (By the above diagram.)

## 5. The commutator

Right now ${\mathfrak g}$ is still just a vector space, the tangent space. But now that there is map ${\exp : \mathfrak g \rightarrow G}$, we can use it to put a new operation on ${\mathfrak g}$, the so-called commutator.

The idea is follows: we want to “multiply” two elements of ${\mathfrak g}$. But ${\mathfrak g}$ is just a vector space, so we can’t do that. However, ${G}$ itself has a group multiplication, so we should pass to ${G}$ using ${\exp}$, use the multiplication in ${G}$ and then come back.

Here are the details. As we just mentioned, ${\exp}$ is a diffeomorphism near ${e \in G}$. So for ${x}$, ${y}$ close to the origin of ${\mathfrak g}$, we can look at ${\exp(x)}$ and ${\exp(y)}$, which are two elements of ${G}$ close to ${e}$. Multiplying them gives an element still close to ${e}$, so its equal to ${\exp(z)}$ for some unique ${z}$, call it ${\mu(x,y)}$.

One can show in fact that ${\mu}$ can be written as a Taylor series in two variables as

$\displaystyle \mu(x,y) = x + y + \frac{1}{2} [x,y] + \text{third order terms} + \dots$

where ${[x,y]}$ is a skew-symmetric bilinear map, meaning ${[x,y] = -[y,x]}$. It will be more convenient to work with ${[x,y]}$ than ${\mu(x,y)}$ itself, so we give it a name:

Definition 17

This ${[x,y]}$ is called the commutator of ${G}$.

Now we know multiplication in ${G}$ is associative, so this should give us some nontrivial relation on the bracket ${[,]}$. Specifically, since

$\displaystyle \exp(x) \left( \exp(y) \exp(z) \right) = \left( \exp(x) \exp(y) \right) \exp(z).$

we should have that ${\mu(x, \mu(y,z)) = \mu(\mu(x,y), z)}$, and this should tell us something. In fact, the claim is:

Theorem 18

The bracket ${[,]}$ satisfies the Jacobi identity

$\displaystyle [x,[y,z]] + [y,[z,x]] + [z,[x,y]] = 0.$

Proof: Although I won’t prove it, the third-order terms (and all the rest) in our definition of ${[x,y]}$ can be written out explicitly as well: for example, for example, we actually have

$\displaystyle \mu(x,y) = x + y + \frac{1}{2} [x,y] + \frac{1}{12} \left( [x, [x,y]] + [y,[y,x]] \right) + \text{fourth order terms} + \dots.$

The general formula is called the Baker-Campbell-Hausdorff formula.

Then we can force ourselves to expand this using the first three terms of the BCS formula and then equate the degree three terms. The left-hand side expands initially as ${\mu\left( x, y + z + \frac{1}{2} [y,z] + \frac{1}{12} \left( [y,[y,z]] + [z,[z,y] \right) \right)}$, and the next step would be something ugly.

This computation is horrifying and painful, so I’ll pretend I did it and tell you the end result is as claimed. $\Box$
There is a more natural way to see why this identity is the “right one”; see Qiaochu. However, with this proof I want to make the point that this Jacobi identity is not our decision: instead, the Jacobi identity is forced upon us by associativity in ${G}$.

Example 19 (Examples of commutators attached to Lie groups)

• If ${G}$ is an abelian group, we have ${-[y,x] = [x,y]}$ by symmetry and ${[x,y] = [y,x]}$ from ${\mu(x,y) = \mu(y,x)}$. Thus ${[x,y] = 0}$ in ${\mathfrak g}$ for any abelian Lie group ${G}$.
• In particular, the brackets for ${G \in \{\mathbb R, \mathbb C, S^1\}}$ are trivial.
• Let ${G = \text{GL }(n, K)}$. Then one can show that

$\displaystyle [T,S] = TS - ST \qquad \forall S, T \in \mathfrak{gl}(n, K).$

• Ditto for ${\text{SL }(n, K)}$.

In any case, with the Jacobi identity we can define an general Lie algebra as an intrinsic object with a Jacobi-satisfying bracket:

Definition 20

A Lie algebra over ${k}$ is a ${k}$-vector space equipped with a skew-symmetric bilinear bracket ${[,]}$ satisfying the Jacobi identity.

A morphism of Lie algebras and preserves the bracket.

Note that a Lie algebra may even be infinite-dimensional (even though we are assuming ${G}$ is finite-dimensional, so that they will never come up as a tangent space).

Example 21 (Associative algebra ${\rightarrow}$ Lie algebra)

Any associative algebra ${A}$ over ${k}$ can be made into a Lie algebra by taking the same underlying vector space, and using the bracket ${[a,b] = ab - ba}$.

## 6. The fundamental theorems

We finish this list of facts by stating the three “fundamental theorems” of Lie theory. They are based upon the functor

$\displaystyle \mathscr{L} : G \mapsto T_e G$

we have described earlier, which is a functor

• from the category of Lie groups
• into the category of finite-dimensional Lie algebras.

The first theorem requires the following definition:

Definition 22

A Lie subgroup ${H}$ of a Lie group ${G}$ is a subgroup ${H}$ such that the inclusion map ${H \hookrightarrow G}$ is also an injective immersion.

A Lie subalgebra ${\mathfrak h}$ of a Lie algebra ${\mathfrak g}$ is a vector subspace preserved under the bracket (meaning that ${[\mathfrak h, \mathfrak h] \subseteq \mathfrak h]}$).

Theorem 23 (Lie I)

Let ${G}$ be a real or complex Lie group with Lie algebra ${\mathfrak g}$. Then given a Lie subgroup ${H \subseteq G}$, the map

$\displaystyle H \mapsto \mathscr{L}(H) \subseteq \mathfrak g$

is a bijection between Lie subgroups of ${G}$ and Lie subalgebras of ${\mathfrak g}$.

Theorem 24 (The Lie functor is an equivalence of categories)

Restrict ${\mathscr{L}}$ to a functor

• from the category of simply connected Lie groups over ${K}$
• to the category of finite-dimensional Lie algebras over ${K}$.

Then

1. (Lie II) ${\mathscr{L}}$ is fully faithful, and
2. (Lie III) ${\mathscr{L}}$ is essentially surjective on objects.

If we drop the “simply connected” condition, we obtain a functor which is faithful and exact, but not full: non-isomorphic Lie groups can have isomorphic Lie algebras (one example is ${\text{SO }(3)}$ and ${\text{SU }(2)}$).

# Combinatorial Nullstellensatz and List Coloring

More than six months late, but here are notes from the combinatorial nullsetllensatz talk I gave at the student colloquium at MIT. This was also my term paper for 18.434, “Seminar in Theoretical Computer Science”.

## 1. Introducing the choice number

One of the most fundamental problems in graph theory is that of a graph coloring, in which one assigns a color to every vertex of a graph so that no two adjacent vertices have the same color. The most basic invariant related to the graph coloring is the chromatic number:

Definition 1

A simple graph ${G}$ is ${k}$-colorable if it’s possible to properly color its vertices with ${k}$ colors. The smallest such ${k}$ is the chromatic number ${\chi(G)}$.

In this exposition we study a more general notion in which the set of permitted colors is different for each vertex, as long as at least ${k}$ colors are listed at each vertex. This leads to the notion of a so-called choice number, which was introduced by Erdös, Rubin, and Taylor.

Definition 2

A simple graph ${G}$ is ${k}$-choosable if its possible to properly color its vertices given a list of ${k}$ colors at each vertex. The smallest such ${k}$ is the choice number ${\mathop{\mathrm{ch}}(G)}$.

Example 3

We have ${\mathop{\mathrm{ch}}(C_{2n}) = \chi(C_{2n}) = 2}$ for any integer ${n}$ (here ${C_{2n}}$ is the cycle graph on ${2n}$ vertices). To see this, we only have to show that given a list of two colors at each vertex of ${C_{2n}}$, we can select one of them.

• If the list of colors is the same at each vertex, then since ${C_{2n}}$ is bipartite, we are done.
• Otherwise, suppose adjacent vertices ${v_1}$, ${v_{2n}}$ are such that some color at ${c}$ is not in the list at ${v_{2n}}$. Select ${c}$ at ${v_1}$, and then greedily color in ${v_2}$, \dots, ${v_{2n}}$ in that order.

We are thus naturally interested in how the choice number and the chromatic number are related. Of course we always have

$\displaystyle \mathop{\mathrm{ch}}(G) \ge \chi(G).$

Näively one might expect that we in fact have an equality, since allowing the colors at vertices to be different seems like it should make the graph easier to color. However, the following example shows that this is not the case.

Example 4 (Erdös)

Let ${n \ge 1}$ be an integer and define

$\displaystyle G = K_{n^n, n}.$

We claim that for any integer ${n \ge 1}$ we have

$\displaystyle \mathop{\mathrm{ch}}(G) \ge n+1 \quad\text{and}\quad \chi(G) = 2.$

The latter equality follows from ${G}$ being partite.

Now to see the first inequality, let ${G}$ have vertex set ${U \cup V}$, where ${U}$ is the set of functions ${u : [n] \rightarrow [n]}$ and ${V = [n]}$. Then consider ${n^2}$ colors ${C_{i,j}}$ for ${1 \le i, j \le n}$. On a vertex ${u \in U}$, we list colors ${C_{1,u(1)}}$, ${C_{2,u(2)}}$, \dots, ${C_{n,u(n)}}$. On a vertex ${v \in V}$, we list colors ${C_{v,1}}$, ${C_{v,2}}$, \dots, ${C_{v,n}}$. By construction it is impossible to properly color ${G}$ with these colors.

The case ${n = 3}$ is illustrated in the figure below (image in public domain).

This surprising behavior is the subject of much research: how can we bound the choice number of a graph as a function of its chromatic number and other properties of the graph? We see that the above example requires exponentially many vertices in ${n}$.

Theorem 5 (Noel, West, Wu, Zhu)

If ${G}$ is a graph with ${n}$ vertices then

$\displaystyle \chi(G) \le \mathop{\mathrm{ch}}(G) \le \max\left( \chi(G), \left\lceil \frac{\chi(G)+n-1}{3} \right\rceil \right).$

In particular, if ${n \le 2\chi(G)+1}$ then ${\mathop{\mathrm{ch}}(G) = \chi(G)}$.

One of the most major open problems in this direction is the following.

Definition 6

A claw-free graph is a graph with no induced ${K_{3,1}}$. For example, the line graph (also called edge graph) of any simple graph ${G}$ is claw-free.

If ${G}$ is a claw-free graph, then ${\mathop{\mathrm{ch}}(G) = \chi(G)}$. In particular, this conjecture implies that for edge coloring, the notions of “chromatic number” and “choice number” coincide.

In this exposition, we prove the following result of Alon.

Theorem 7 (Alon)

A bipartite graph ${G}$ is ${\left\lfloor L(G) \right\rfloor+1}$ choosable, where

$\displaystyle L(G) \overset{\mathrm{def}}{=} \max_{H \subseteq G} |E(H)|/|V(H)|$

is half the maximum of the average degree of subgraphs ${H}$.

In particular, recall that a planar bipartite graph ${H}$ with ${r}$ vertices contains at most ${2r-4}$ edges. Thus for such graphs we have ${L(G) \le 2}$ and deduce:

Corollary 8

A planar bipartite graph is ${3}$-choosable.

This corollary is sharp, as it applies to ${K_{2,4}}$ which we have seen in Example 4 has ${\mathop{\mathrm{ch}}(K_{2,4}) = 3}$.

The rest of the paper is divided as follows. First, we begin in §2 by stating Theorem 9, the famous combinatorial nullstellensatz of Alon. Then in §3 and §4, we provide descriptions of the so-called graph polynomial, to which we then apply combinatorial nullstellensatz to deduce Theorem 18. Finally in §5, we show how to use Theorem 18 to prove Theorem 7.

## 2. Combinatorial Nullstellensatz

The main tool we use is the Combinatorial Nullestellensatz of Alon.

Theorem 9 (Combinatorial Nullstellensatz)

Let ${F}$ be a field, and let ${f \in F[x_1, \dots, x_n]}$ be a polynomial of degree ${t_1 + \dots + t_n}$. Let ${S_1, S_2, \dots, S_n \subseteq F}$ such that ${\left\lvert S_i \right\rvert > t_i}$ for all ${i}$.

Assume the coefficient of ${x_1^{t_1}x_2^{t_2}\dots x_n^{t_n}}$ of ${f}$ is not zero. Then we can pick ${s_1 \in S_1}$, \dots, ${s_n \in S_n}$ such that

$\displaystyle f(s_1, s_2, \dots, s_n) \neq 0.$

Example 10

Let us give a second proof that

$\displaystyle \mathop{\mathrm{ch}}(C_{2n}) = 2$

for every positive integer ${n}$. Our proof will be an application of the Nullstellensatz.

Regard the colors as real numbers, and let ${S_i}$ be the set of colors at vertex ${i}$ (hence ${1 \le i \le 2n}$, and ${|S_i| = 2}$). Consider the polynomial

$\displaystyle f = \left( x_1-x_2 \right)\left( x_2-x_3 \right) \dots \left( x_{2n-1}-x_{2n} \right)\left( x_{2n}-x_1 \right)$

The coefficient of ${x_1^1 x_2^1 \dots x_{2n}^1}$ is ${2 \neq 0}$. Therefore, one can select a color from each ${S_i}$ so that ${f}$ does not vanish.

## 3. The Graph Polynomial, and Directed Orientations

Motivated by Example 10, we wish to apply a similar technique to general graphs ${G}$. So in what follows, let ${G}$ be a (simple) graph with vertex set ${\{1, \dots, n\}}$.

Definition 11

The graph polynomial of ${G}$ is defined by

$\displaystyle f_G(x_1, \dots, x_n) = \prod_{\substack{(i,j) \in E(G) \\ i < j}} (x_i-x_j).$

We observe that coefficients of ${f_G}$ correspond to differences in directed orientations. To be precise, we introduce the notation:

Definition 12

Consider orientations on the graph ${G}$ with vertex set ${\{1, \dots, n\}}$, meaning we assign a direction ${v \rightarrow w}$ to every edge of ${G}$ to make it into a directed graph ${G}$. An oriented edge is called ascending if ${v \rightarrow w}$ and ${v \le w}$, i.e. the edge points from the smaller number to the larger one.

Then we say that an orientation is

• even if there are an even number of ascending edges, and
• odd if there are an odd number of ascending edges.

Finally, we define

• ${\mathop{\mathrm{DE}}_G(d_1, \dots, d_n)}$ to the be set of all even orientations of ${G}$ in which vertex ${i}$ has indegree ${d_i}$.
• ${\mathop{\mathrm{DO}}_G(d_1, \dots, d_n)}$ to the be set of all odd orientations of ${G}$ in which vertex ${i}$ has indegree ${d_i}$.

Set ${\mathop{\mathrm{D}}_G(d_1,\dots,d_n) = \mathop{\mathrm{DE}}_G(d_1,\dots,d_n) \cup \mathop{\mathrm{DO}}_G(d_1,\dots,d_n)}$.

Example 13

Consider the following orientation:

There are exactly two ascending edges, namely ${1 \rightarrow 2}$ and ${2 \rightarrow 4}$. The indegrees of are ${d_1 = 0}$, ${d_2 = 2}$ and ${d_3 = d_4 = 1}$. Therefore, this particular orientation is an element of ${\mathop{\mathrm{DE}}_G(0,2,1,1)}$. In terms of ${f_G}$, this corresponds to the choice of terms

$\displaystyle \left( x_1- \boldsymbol{x_2} \right) \left( \boldsymbol{x_2}-x_3 \right) \left( x_2-\boldsymbol{x_4} \right) \left( \boldsymbol{x_3}-x_4 \right)$

which is a ${+ x_2^2 x_3 x_4}$ term.

Lemma 14

In the graph polynomial of ${G}$, the coefficient of ${x_1^{d_1} \dots x_n^{d_n}}$ is

$\displaystyle \left\lvert \mathop{\mathrm{DE}}_G(d_1, \dots, d_n) \right\rvert - \left\lvert \mathop{\mathrm{DO}}_G(d_1, \dots, d_n) \right\rvert.$

Proof: Consider expanding ${f_G}$. Then each expanded term corresponds to a choice of ${x_i}$ or ${x_j}$ from each ${(i,j)}$, as in Example 13. The term has coefficient ${+1}$ is the orientation is even, and ${-1}$ if the orientation is odd, as desired. $\Box$

Thus we have an explicit combinatorial description of the coefficients in the graph polynomial ${f_G}$.

## 4. Coefficients via Eulerian Suborientations

We now give a second description of the coefficients of ${f_G}$.

Definition 15

Let ${D \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)}$, viewed as a directed graph. An Eulerian suborientation of ${D}$ is a subgraph of ${D}$ (not necessarily induced) in which every vertex has equal indegree and outdegree. We say that such a suborientation is

• even if it has an even number of edges, and
• odd if it has an odd number of edges.

Note that the empty suborientation is allowed. We denote the even and odd Eulerian suborientations of ${D}$ by ${\mathop{\mathrm{EE}}(D)}$ and ${\mathop{\mathrm{EO}}(D)}$, respectively.

Eulerian suborientations are brought into the picture by the following lemma.

Lemma 16

Assume ${D \in \mathop{\mathrm{DE}}_G(d_1, \dots, d_n)}$. Then there are natural bijections

\displaystyle \begin{aligned} \mathop{\mathrm{DE}}_G(d_1, \dots, d_n) &\rightarrow \mathop{\mathrm{EE}}(D) \\ \mathop{\mathrm{DO}}_G(d_1, \dots, d_n) &\rightarrow \mathop{\mathrm{EO}}(D). \end{aligned}

Similarly, if ${D \in \mathop{\mathrm{DO}}_G(d_1, \dots, d_n)}$ then there are bijections

\displaystyle \begin{aligned} \mathop{\mathrm{DE}}_G(d_1, \dots, d_n) &\rightarrow \mathop{\mathrm{EO}}(D) \\ \mathop{\mathrm{DO}}_G(d_1, \dots, d_n) &\rightarrow \mathop{\mathrm{EE}}(D). \end{aligned}

Proof: Consider any orientation ${D' \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)}$, Then we define a suborietation of ${D}$, denoted ${D \rtimes D'}$, by including exactly the edges of ${D}$ whose orientation in ${D'}$ is in the opposite direction. It’s easy to see that this induces a bijection

$\displaystyle D \rtimes - : \mathop{\mathrm{D}}_G(d_1, \dots, d_n) \rightarrow \mathop{\mathrm{EE}}(D) \cup \mathop{\mathrm{EO}}(D)$

Moreover, remark that

• ${D \rtimes D'}$ is even if ${D}$ and ${D'}$ are either both even or both odd, and
• ${D \rtimes D'}$ is odd otherwise.

The lemma follows from this. $\Box$

Corollary 17

In the graph polynomial of ${G}$, the coefficient of ${x_1^{d_1} \dots x_n^{d_n}}$ is

$\displaystyle \pm \left( \left\lvert \mathop{\mathrm{EE}}(D) \right\rvert - \left\lvert \mathop{\mathrm{EO}}(D) \right\rvert \right)$

where ${D \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)}$ is arbitrary.

Proof: Combine Lemma 14 and Lemma 16. $\Box$

We now arrive at the main result:

Theorem 18

Let ${G}$ be a graph on ${\{1, \dots, n\}}$, and let ${D \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)}$ be an orientation of ${G}$. If ${\left\lvert \mathop{\mathrm{EE}}(D) \right\rvert \neq \left\lvert \mathop{\mathrm{EO}}(D) \right\rvert}$, then given a list of ${d_i+1}$ colors at each vertex of ${G}$, there exists a proper coloring of the vertices of ${G}$.

In particular, ${G}$ is ${(1+\max_i d_i)}$-choosable.

Proof: Combine Corollary 17 with Theorem 9. $\Box$

## 5. Finding an orientation

Armed with Theorem 18, we are almost ready to prove Theorem 7. The last ingredient is that we need to find an orientation on ${G}$ in which the maximal degree is not too large. This is accomplished by the following.

Lemma 19

Let ${L(G) \overset{\mathrm{def}}{=} \max_{H \subseteq G} |E(H)|/|V(H)|}$ as in Theorem 7. Then ${G}$ has an orientation in which every indegree is at most ${\left\lceil L(G) \right\rceil}$.

Proof: This is an application of Hall’s marriage theorem.

Let ${d = \left\lceil L(G) \right\rceil \ge L(G)}$. Construct a bipartite graph

$\displaystyle E \cup X \qquad \text{where}\qquad E = E(G) \quad\text{ and }\quad X = \underbrace{V(G) \sqcup \dots \sqcup V(G)}_{d \text{ times}}.$

Connect ${e \in E}$ and ${v \in X}$ if ${v}$ is an endpoint of ${e}$. Since ${d \ge L(G)}$ we satisfy Hall’s condition (as ${L(G)}$ is a condition for all subgraphs ${H \subseteq G}$) and can match each edge in ${E}$ to a (copy of some) vertex in ${X}$. Since there are exactly ${d}$ copies of each vertex in ${X}$, the conclusion follows. $\Box$

Now we can prove Theorem 7. Proof: According to Lemma 19, pick ${D \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)}$ where ${\max d_i \le \left\lceil L(G) \right\rceil}$. Since ${G}$ is bipartite, we obviously have ${\mathop{\mathrm{EO}}(D) = \varnothing}$, since ${G}$ cannot have any odd cycles. So Theorem 18 applies and we are done. $\Box$