Careful readers of my blog might have heard about plans to have a second edition of Napkin out by the end of February. As it turns out I was overly ambitious, and (seeing that I am spending the next week … Continue reading
I think it would be nice if every few years I updated my generic answer to “how do I get better at math contests?”. So here is the 2019 version. Unlike previous instances, I’m going to be a little less olympiad-focused than I usually am, since these days I get a lot of people asking for help on the AMC and AIME too.
(Historical notes: you can see the version from right after I graduated and the version from when I was still in high school. I admit both of them make me cringe slightly when I read them today. I still think everything written there is right, but the style and focus seems off to me now.)
0. Stop looking for the “right” training (or: be yourself)
These days many of the questions I get are clearly most focused on trying to find a perfect plan — questions like “what did YOU do to get to X” or “how EXACTLY do I practice for Y”. (Often these words are in all-caps in the email, too!) When I see these I always feel very hesitant to answer. The reason is that I always feel like there’s some implicit hope that I can give you some recipe that, if you follow it, will guarantee reaching your goals.
I’m sorry, math contests don’t work that way (and can’t work that way). I actually think that if I gave you a list of which chapters of which books I read in 2009-2010 over which weeks, and which problems I did on each day, and you followed it to the letter, it would go horribly.
Why? It’s not just a talent thing, I think. Solving math problems is actually a deeply personal art: despite being what might appear to be a cold and logical discipline, learning math and getting better at it actually requires being human. Different people find different things natural or unnatural, easy or hard, et cetera. If you try to squeeze yourself into some mold or timeline then the results will probably be counterproductive.
On the flip side, this means that you can worry a lot less. I actually think that surprisingly often, you can get a first-order approximation of what’s the “best” thing to do by simply doing whatever feels the most engaging or rewarding (assuming you like math, of course). Of course there are some places where this is not correct (e.g., you might hate geometry, but cannot just ignore it). But the first-order approximation is actually quite decent.
That’s why in the introduction to my geometry book, I explicitly have the line:
Readers are encouraged to not be bureaucratic in their learning and move around as they see fit, e.g., skipping complicated sections and returning to them later, or moving quickly through familiar material.
Put another way: as learning math is quite personal, the advice “be yourself” is well-taken.
1. Some brief recommendations (anyways)
With all that said, probably no serious harm will come from me listing a little bit of references I think are reasonable — so that you have somewhere to start, and can oscillate from there.
For learning theory and fundamentals:
For sources of additional practice problems (other than the particular test you’re preparing for):
- The collegiate contests HMMT November, PUMaC, CMIMC will typically have decent short-answer problems.
- HMMT February is by far the hardest short-answer contest I know of.
- At the olympiad level, there are so many national olympiads and team selection tests that you will never finish. (My website has an archive of USA problems and solutions if you’re interested in those in particular.)
The IMO Shortlist is also good place to work as it contains proposals of varying difficulty from many countries — and thus is the most culturally diverse. As for other nations, as a rule of thumb, any country that often finishes in the top 20 at the IMO (say) will probably have a good questions on their national olympiad or TST.
For every subject that’s not olympiad geometry, there are actually surprisingly few named theorems.
2. Premature optimization is the root of all evil (so just get your hands dirty)
For some people, the easiest first step to getting better is to double the amount of time you spend practicing. (Unless that amount is zero, in which case, you should just start.)
There is a time and place for spending time thinking about how to practice — one example is if you’ve been working a while and feel like nothing has changed, or you’ve been working on some book and it just doesn’t feel fun, etc. Another common example is if you notice you keep missing all the functional equations on the USAMO: then, maybe it’s time to search up some handouts on functional equations. Put another way, if you feel stuck, then you can start thinking about whether you’re not doing something right.
On the other extreme, if you’re wondering whether you are ready to read book X or do problems from Y contest, my advice is to just try it and see if you like it. There is no commitment: just read Chapter 1, see how you feel. If it works, keep doing it, if not, try something else.
(I can draw an analogy from my own life. Whenever I am learning a new board game or card game, like Catan or Splendor or whatever, I always overthink it. I spend all this time thinking and theorizing and trying to come up with this brilliant strategy — which never works, because it’s my first game, for crying out loud. It turns out that until you start grappling at close range and getting your hands dirty, your internal model of something you’ve never done is probably not that good.)
3. Doing problems just above your level (and a bit on reflecting on them)
There is one pitfall that I do see sometimes, common enough I will point it out. If you mostly (only?) do old practice tests or past problems, then you’re liable to be spending too much time on easy problems. That was the topic of another old post of mine, but the short story is that if you find yourself constantly getting 130ish on AMC10 practice tests, then maybe you should spend most of your time working on problems 21-25 rather than repeatedly grinding 1-20 over and over. (See 28:30-29:00 here to hear Zuming make fun of them.)
The common wisdom is that you should consistently do problems just above your level so that you gradually increase the difficulty of problems you are able to solve. The situation is a little more nuanced at the AMC/AIME level, since for those short-answer contests it’s also important to be able to do routine problems quickly and accurately. However, I think for most people, you really should be spending at least 70% of your time getting smarter, rather than just faster.
I think in this case, I want to give concrete descriptions. Here’s some examples of what can happen after a problem.
- You looked at the problem and immediately (already?) knew how to do it. Then you probably didn’t learn much from it. (But at least you’ll get faster, if not smarter.)
- You looked at the problem and didn’t know right away how to start, but after a little while figured it out. That’s a little better.
- You struggled with the problem and eventually figured out a solution, but maybe not the most elegant one. I think that’s a great situation to be in. You came up with some solution to the problem, so you understand it fairly well, but there’s still more for you to update your instincts on. What can you do in the future to get solutions more like the elegant one?
- You struggled with the problem and eventually gave up, then when you read the solution you realize quickly what you were missing. I think that’s a great situation to be in, too. You now want to update your instincts by a little bit — how could you make sure you don’t miss something like that again in the future?
- The official solution quoted some theorem you don’t know. If this was among a batch of problems where the other problems felt about the right level to you, then I think often this is a pretty good time to see if you can learn the statement (better, proof) of the theorem. You have just spent some time working on a situation in which the theorem was useful, so that data is fresh in your mind. And pleasantly often, you will find that ideas you came up with during your attempt on the problem correspond to ideas in the statement or proof of the theorem, which is great!
- You didn’t solve the problem, and the solution makes sense, but you don’t see how you would have come up with it. It’s possible that this is the fault of the solutions author (many people are actually quite bad at making solutions read naturally). If you have a teacher, this is the right time to ask them about it. But it’s also possible that the problem was too hard. In general, I think it’s better to miss problems “by a little”, whatever that means, so that you can update your intuition correctly.
- You can’t even understand the solution. Okay, too hard.
You’ll notice how much emphasis I place on the post-problem reflection process. This is actually important — after all the time you spent working on the problem itself, you want to update your instincts as much as possible to get the payoff. In particular, I think it’s usually worth it to read the solutions to problems you worked on, whether or not you solve them. In general, after reading a solution, I think you should be able to state in a couple sentences all the main ideas of the solution, and basically know how to solve the problem from there.
For the olympiad level, I have a whole different post dedicated to reading solutions, and interested readers can read more there. (One point from that post I do want to emphasize since it wasn’t covered explicitly in any of the above examples: by USA(J)MO level it becomes important to begin building intuition that you can’t explicitly formalize. You may start having vague feelings and notions that you can’t quite put your finger on, but you can feel it. These non-formalizable feelings are valuable, take note of them.)
4. Leave your ego out (e.g. be willing to give up on problems)
This is easy advice to give, but it’s hard advice to follow. For concreteness, here are examples of things I think can be explained this way.
Sometimes people will ask me whether they need to solve every problem in each chapter of EGMO, or do every past practice test, or so on. The answer is: of course not, and why would you even think that? There’s nothing magical about doing 80% of the problems versus 100% of them. (If there was, then EGMO is secretly a terrible book, because I commented out some problems, and so OH NO YOU SKIPPED SOME AAAHHHHH.) And so it’s okay to start Chapter 5 even though you didn’t finish that last challenge problem at the end. Otherwise you let one problem prevent you from working on the next several.
Or, sometimes I learn about people who, if they do not solve an olympiad problem, will refuse to look at the solution; instead they will mark it in a spreadsheet and to come back to later. In short, they never give up on a problem: which I think is a bad idea, since reflecting on missed problems is so important. (It is not as if you can realistically run out of olympiad problems to do.) And while this is still better than giving up too early, I mean, all things in moderation, right?
I think if somehow people were able to completely leave your ego out, and not worry at all about how good you are and rather just maximize learning, then mistakes like these two would be a lot rarer. Of course, this is impossible to do in practice (we’re all human), but it’s good to keep in mind at least that this is an ideal we can strive for.
5. Enjoy it
Which leads me to the one bit that everyone already knows, but that no platitude-filled post would be complete without: to do well at math contests (or anything hard) you probably have to enjoy the process of getting better. Not just the end result. You have to enjoy the work itself.
Which is not to say you have to do it all the time or for hours a day. Doing math is hard, so you get tired eventually, and beyond that forcing yourself to work is not productive. Thus when I see people talk about how they plan to do every shortlist problem, or they will work N hours per day over M time, I always feel a little uneasy, because it always seems too results-oriented.
In particular, I actually think it’s quite hard to spend more than two or three good hours per day on a regular basis. I certainly never did — back in high school (and even now), if I solved one problem that took me more than an hour, that was considered a good day. (But I should also note that the work ethic of my best students consistently amazes me; it far surpasses mine.) In that sense, the learning process can’t be forced or rushed.
There is one sense in which you can get more hours a day, that I am on record saying quite often: if you think about math in the shower, then you know you’re doing it right.
In the previous post we defined -adic numbers. This post will state (mostly without proof) some more surprising results about continuous functions . Then we give the famous proof of the Skolem-Mahler-Lech theorem using -adic analysis.
1. Digression on
Before I go on, I want to mention that is not algebraically closed. So, we can take its algebraic closure — but this field is now no longer complete (in the topological sense). However, we can then take the completion of this space to obtain . In general, completing an algebraically closed field remains algebraically closed, and so there is a larger space which is algebraically closed and complete. This space is called the -adic complex numbers.
We won’t need at all in what follows, so you can forget everything you just read.
2. Mahler coefficients: a description of continuous functions on
One of the big surprises of -adic analysis is that we can concretely describe all continuous functions . They are given by a basis of functions
in the following way.
The are called the Mahler coefficients of .
You’ll note that these are the same finite differences that one uses on polynomials in high school math contests, which is why they are also called “Mahler differences”.
Thus one can think of as saying that the values of , , \dots behave like a polynomial modulo for every . Amusingly, this fact was used on a USA TST in 2011:
3. Analytic functions
We say that a function is analytic if it has a power series expansion
As before there is a characterization in terms of the Mahler coefficients:
Just as holomorphic functions have finitely many zeros, we have the following result on analytic functions on .
We close off with an application of the analyticity results above.
Proof: According to the theory of linear recurrences, there exists a matrix such that we can write as a dot product
Let be a prime not dividing . Let be an integer such that .
Fix any . We will prove that either all the terms
are zero, or at most finitely many of them are. This will conclude the proof.
Let for some integer matrix . We have
Thus we have written in Mahler form. Initially, we define , but by Mahler’s theorem (since ) it follows that extends to a function . Also, we can check that hence is even analytic.
Thus by Strassman’s theorem, is either identically zero, or else it has finitely many zeros, as desired.
I think this post is more than two years late in coming, but anywhow…
This post introduces the -adic integers , and the -adic numbers . The one-sentence description is that these are “integers/rationals carrying full mod information” (and only that information).
The first four sections will cover the founding definitions culminating in a short solution to a USA TST problem.
In this whole post, is always a prime. Much of this is based off of Chapter 3A from Straight from the Book.
Before really telling you what and are, let me tell you what you might expect them to do.
In elementary/olympiad number theory, we’re already well-familiar with the following two ideas:
- Taking modulo a prime or prime , and
- Looking at the exponent .
Let me expand on the first point. Suppose we have some Diophantine equation. In olympiad contexts, one can take an equation modulo to gain something else to work with. Unfortunately, taking modulo loses some information: (the reduction is far from injective).
If we want finer control, we could consider instead taking modulo , rather than taking modulo . This can also give some new information (cubes modulo , anyone?), but it has the disadvantage that isn’t a field, so we lose a lot of the nice algebraic properties that we got if we take modulo .
One of the goals of -adic numbers is that we can get around these two issues I described. The -adic numbers we introduce is going to have the following properties:
- You can “take modulo for all at once”. In olympiad contexts, we are used to picking a particular modulus and then seeing what happens if we take that modulus. But with -adic numbers, we won’t have to make that choice. An equation of -adic numbers carries enough information to take modulo .
- The numbers form a field, the nicest possible algebraic structure: makes sense. Contrast this with , which is not even an integral domain.
- It doesn’t lose as much information as taking modulo does: rather than the surjective we have an injective map .
- Despite this, you “ignore” some “irrelevant” data. Just like taking modulo , you want to zoom-in on a particular type of algebraic information, and this means necessarily losing sight of other things. (To draw an analogy: the equation has no integer solutions, because, well, squares are nonnegative. But you will find that this equation has solutions modulo any prime , because once you take modulo you stop being able to talk about numbers being nonnegative. The same thing will happen if we work in -adics: the above equation has a solution in for every prime .)
So, you can think of -adic numbers as the right tool to use if you only really care about modulo information, but normal isn’t quite powerful enough.
To be more concrete, I’ll give a poster example now:
Here is a problem where we clearly only care about -type information. Yet it’s a nontrivial challenge to do the necessary manipulations mod (try it!). The basic issue is that there is no good way to deal with the denominators modulo (in part is not even an integral domain).
However, with -adic analysis we’re going to be able to overcome these limitations and give a “straightforward” proof by using the identity
Such an identity makes no sense over or for converge reasons, but it will work fine over the , which is all we need.
2. Algebraic perspective
We now construct and . I promised earlier that a -adic integer will let you look at “all residues modulo ” at once. This definition will formalize this.
2.1. Definition of
In this way we get an injective map
which is not surjective. So there are more -adic integers than usual integers.
(Remark for experts: those of you familiar with category theory might recognize that this definition can be written concisely as
where the inverse limit is taken across .)
2.2. Base expansion
Here is another way to think about -adic integers using “base ”. As in the example earlier, every usual integer can be written in base , for example
More generally, given any , we can write down a “base ” expansion in the sense that there are exactly choices of given . Continuing the example earlier, we would write
and in general we can write
where , such that the equation holds modulo for each . Note the expansion is infinite to the left, which is different from what you’re used to.
(Amusingly, negative integers also have infinite base expansions: , corresponding to .)
Thus you may often hear the advertisement that a -adic integer is an “possibly infinite base expansion”. This is correct, but later on we’ll be thinking of in a more and more “analytic” way, and so I prefer to think of this as a “Taylor series with base ”. Indeed, much of your intuition from generating functions (where is a field) will carry over to .
Here is one way in which your intuition from generating functions carries over:
Contrast this with the corresponding statement for : a generating function is invertible iff .
Proof: If then , so clearly not invertible. Otherwise, for all , so we can take an inverse modulo , with . As the are themselves compatible, the element is an inverse.
With this observation, here is now the definition of .
Continuing our generating functions analogy:
This means is “Laurent series with base ”, and in particular according to the earlier proposition we deduce:
Thus, continuing our base analogy, elements of are in bijection with “Laurent series”
for . So the base representations of elements of can be thought of as the same as usual, but extending infinitely far to the left (rather than to the right).
(Fair warning: the field has characteristic zero, not .)
(At this point I want to make a remark about the fact , connecting it to the wish-list of properties I had before. In elementary number theory you can take equations modulo , but if you do the quantity doesn’t make sense unless you know . You can’t fix this by just taking modulo since then you need to get , ad infinitum. You can work around issues like this, but the nice feature of and is that you have modulo information for “all at once”: the information of packages all the modulo information simultaneously. So you can divide by with no repercussions.)
3. Analytic perspective
Up until now we’ve been thinking about things mostly algebraically, but moving forward it will be helpful to start using the language of analysis. Usually, two real numbers are considered “close” if they are close on the number of line, but for -adic purposes we only care about modulo information. So, we’ll instead think of two elements of or as “close” if they differ by a large multiple of .
For this we’ll borrow the familiar from elementary number theory.
This fulfills the promise that and are close if they look the same modulo for large ; in that case is large and accordingly is small.
3.2. Ultrametric space
In this way, and becomes a metric space with metric given by .
In fact, these spaces satisfy a stronger form of the triangle inequality than you are used to from .
However, is more than just a metric space: it is a field, with its own addition and multiplication. This means we can do analysis just like in or : basically, any notion such as “continuous function”, “convergent series”, et cetera has a -adic analog. In particular, we can define what it means for an infinite sum to converge:
With this definition in place, the “base ” discussion we had earlier is now true in the analytic sense: if then
Indeed, the th partial sum is divisible by , hence the partial sums approach as .
While the definitions are all the same, there are some changes in properties that should be true. For example, in convergence of partial sums is simpler:
Contrast this with in . You can think of this as a consequence of strong triangle inequality. Proof: By multiplying by a large enough power of , we may assume . (This isn’t actually necessary, but makes the notation nicer.)
Observe that must eventually stabilize, since for large enough we have . So let be the eventual residue modulo of for large . In the same way let be the eventual residue modulo , and so on. Then one can check we approach the limit .
Here’s a couple exercises to get you used to thinking of and as metric spaces.
3.3. More fun with geometric series
While we’re at it, let’s finally state the -adic analog of the geometric series formula.
Proof: Note that the partial sums satisfy , and as since .
So, is really a correct convergence in . And so on.
If you buy the analogy that is generating functions with base , then all the olympiad generating functions you might be used to have -adic analogs. For example, you can prove more generally that:
(I haven’t defined , but it has the properties you expect.) The proof is as in the real case; even the theorem statement is the same except for the change for the extra subscript of . I won’t elaborate too much on this now, since -adic exponentiation will be described in much more detail in the next post.
Note that the definition of could have been given for as well; we didn’t need to introduce it (after all, we have in olympiads already). The big important theorem I must state now is:
This is the definition of you’ll see more frequently; one then defines in terms of (rather than vice-versa) according to
Let me justify why this definition is philosophically nice.
Suppose you are a numerical analyst and you want to estimate the value of the sum
to within . The sum consists entirely of rational numbers, so the problem statement would be fair game for ancient Greece. But it turns out that in order to get a good estimate, it really helps if you know about the real numbers: because then you can construct the infinite series , and deduce that , up to some small error term from the terms past , which can be bounded.
Of course, in order to have access to enough theory to prove that , you need to have the real numbers; it’s impossible to do serious analysis in the non-complete space , where e.g. the sequence , , , , \dots is considered “not convergent” because . Instead, all analysis is done in the completion of , namely .
Now suppose you are an olympiad contestant and want to estimate the sum
to within mod (i.e. to within in ). Even though is a rational number, it still helps to be able to do analysis with infinite sums, and then bound the error term (i.e. take mod ). But the space is not complete with respect to either, and thus it makes sense to work in the completion of with respect to . This is exactly .
4. Solving USA TST 2002/2
Let’s finally solve Example~1, which asks to compute
Armed with the generalized binomial theorem, this becomes straightforward.
Using the elementary facts that and , this solves the problem.
Some of the stuff covered in this handout:
- Advice for constructing the triangle centers (hint: circumcenter goes first)
- An example of how to rearrange the conditions of a problem and draw a diagram out-of-order
- Some mechanical suggestions such as dealing with phantom points
- Some examples of computer-generated figures
One of the major headaches of using complex numbers in olympiad geometry problems is dealing with square roots. In particular, it is nontrivial to express the incenter of a triangle inscribed in the unit circle in terms of its vertices.
The following lemma is the standard way to set up the arc midpoints of a triangle. It appears for example as part (a) of Lemma 6.23.
Theorem 1 is often used in combination with the following lemma, which lets one assign the incenter the coordinates in the above notation.
Unfortunately, the proof of Theorem 1 in my textbook is wrong, and I cannot find a proof online (though I hear that Lemmas in Olympiad Geometry has a proof). So in this post I will give a correct proof of Theorem 1, which will hopefully also explain the mysterious introduction of the minus signs in the theorem statement. In addition I will give a version of the theorem valid for quadrilaterals.
2. A Word of Warning
I should at once warn the reader that Theorem 1 is an existence result, and thus must be applied carefully.
To see why this matters, consider the following problem, which appeared as problem 1 of the 2016 JMO.
By experimenting with the diagram, it is not hard to guess that the correct fixed point is the midpoint of arc , as seen in the figure below. One might be tempted to write , , , and assert the two incenters are and , and that the fixed point is .
This is a mistake! If one applies Theorem 1 twice, then the choices of “square roots” of the common vertices and may not be compatible. In fact, they cannot be compatible, because the arc midpoint of opposite is different from the arc midpoint of opposite .
In fact, I claim this is not a minor issue that one can work around. This is because the claim that the circumcircle of passes through the midpoint of arc is false if lies on the arc on the same side as ! In that case it actually passes through instead. Thus the truth of the problem really depends on the fact that the quadrilateral is convex, and any attempt with complex numbers must take this into account to have a chance of working.
3. Proof of the theorem for triangles
Fix now, so we require , , . There are choices of square roots , , we can take (differing by a sign); we wish to show one of them works.
We pick an arbitrary choice for first. Then, of the two choices of , we pick the one such that . Similarly, for the two choices of , we pick the one such that . Our goal is to show that under these conditions, we have again.
The main trick is to now consider the arc midpoint , which we denote by . It is easy to see that:
Thus, we have
From this we can see why the minus signs are necessary.
4. A version for quadrilaterals
We now return to the setting of a convex quadrilateral that we encountered in Example 3. Suppose we preserve the variables , , that we were given from Theorem 1, but now add a fourth complex number with . How are the new arc midpoints determined? The following theorem answers this question.
This setup is summarized in the following figure.
Note that unlike Theorem 1, the four arcs cut out by the sides of do not all have the same sign (I chose to have coordinates ). This asymmetry is inevitable (see if you can understand why from the proof below).
Proof: We select , , with Theorem 1. Now, pick a choice of such that is the arc midpoint of not containing and . Then the arc midpoint of not containing or is given by
On the other hand, the calculation of for the midpoint of follows by applying Lemma 4 again. (applied to triangle ). The midpoint of is computed similarly.
In other problems, the four vertices of the quadrilateral may play more symmetric roles and in that case it may be desirable to pick a setup in which the four vertices are labeled in order. By relabeling the letters in Theorem 6 one can prove the following alternate formulation.
To test the newfound theorem, here is a cute easy application.
In a previous post I tried to make the point that math olympiads should not be judged by their relevance to research mathematics. In doing so I failed to actually explain why I think math olympiads are a valuable experience for high schoolers, so I want to make amends here.
In high school I used to think that math contests were primarily meant to encourage contestants to study some math that is (much) more interesting than what’s typically shown in high school. While I still think this is one goal, and maybe it still is the primary goal in some people’s minds, I no longer believe this is the primary benefit.
My current belief is that there are two major benefits from math competitions:
- To build a social network for gifted high school students with similar interests.
- To provide a challenging experience that lets gifted students grow and develop intellectually.
I should at once disclaim that I do not claim these are the only purpose of mathematical olympiads. Indeed, mathematics is a beautiful subject and introducing competitors to this field of study is of course a great thing (in particular it was life-changing for me). But as I have said before, many alumni of math olympiads do not eventually become mathematicians, and so in my mind I would like to make the case that these alumni have gained a lot from the experience anyways.
2. Social experience
Now that we have email, Facebook, Art of Problem Solving, and whatnot, the math contest community is much larger and stronger than it’s ever been in the past. For the first time, it’s really possible to stay connected with other competitors throughout the entire year, rather than just seeing each other a handful of times during contest season. There’s literally group chats of contestants all over the country where people talk about math problems or the solar eclipse or share funny pictures or inside jokes or everything else. In many ways, being part of the high school math contest community is a lot like having access to the peer group at a top-tier university, except four years earlier.
There’s some concern that a competitive culture is unhealthy for the contestants. I want to make a brief defense here.
I really do think that the contest community is good at being collaborative rather than competitive. You can imagine a world where the competitors think about contests in terms of trying to get a better score than the other person.  That would not be a good world. But I think by and large the community is good at thinking about it as just trying to maximize their own score. The score of the person next to you isn’t supposed to matter (and thinking about it doesn’t help, anyways).
Put more bluntly, on contest day, you have one job: get full marks. 
Because we have a culture of this shape, we now get a group of talented students all working towards the same thing, rather than against one another. That’s what makes it possible to have a self-supportive community, and what makes it possible for the contestants to really become friends with each other.
I think the strongest contestants don’t even care about the results of contests other than the few really important ones (like USAMO/IMO). It is a long-running joke that the Harvard-MIT Math Tournament is secretly just a MOP reunion, and I personally see to it that this happens every year. 
I’ve also heard similar sentiments about ARML:
I enjoy ARML primarily based on the social part of the contest, and many people agree with me; the highlight of ARML for some people is the long bus ride to the contest. Indeed, I think of ARML primarily as a social event, with some mathematics to make it look like the participants are actually doing something important.
(Don’t tell the parents.)
3. Intellectual growth
My view is that if you spend a lot of time thinking or working about anything deep, then you will learn and grow from the experience, almost regardless of what that thing is at an object level. Take chess as an example — even though chess definitely has even fewer “real-life applications” than math, if you take anyone with a 2000+ rating I don’t think many of them would think that the time they invested into the game was wasted. 
Olympiad mathematics seems to be no exception to this. In fact the sheer depth and difficulty of the subject probably makes it a particularly good example. 
I’m now going to fill this section with a bunch of examples although I don’t claim the list is exhaustive. First, here are the ones that everyone talks about and more or less agrees on:
- Learning how to think, because, well, that’s how you solve a contest problem.
- Learning to work hard and not give up, because the contest is difficult and you will not win by accident; you need to actually go through a lot of training.
- Dual to above, learning to give up on a problem, because sometime the problem really is too hard for you and you won’t solve it even if you spend another ten or twenty or fifty hours, and you have to learn to cut your losses. There is a balancing act here that I think really is best taught by experience, rather than the standard high-school moral cheerleading where you are supposed to “never give up” or something.
- But also learning to be humble or to ask for help, which is a really hard thing for a lot of young contestants to do.
- Learning to be patient, not only with solving problems but with the entire journey. You usually do not improve dramatically overnight.
Here are some others I also believe, but don’t hear as often.
- Learning to be independent, because odds are your high-school math teacher won’t be able to help you with USAMO problems. Training for the highest level of contests is these days almost always done more or less independently. I think having the self-motivation to do the training yourself, as well as the capacity to essentially have to design your own training (making judgments on what to work on, et cetera) is itself a valuable cross-domain skill. (I’m a little sad sometimes that by teaching I deprive my students of the opportunity to practice this. It is a cost.)
- Being able to work neatly, not because your parents told you to but because if you are sloppy then it will cost you points when you make small (or large) errors on IMO #1. Olympiad problems are difficult enough as is, and you do not want to let them become any harder because of your own sloppiness. (And there are definitely examples of olympiad problems which are impossible to solve if you are not organized.)
- Being able to organize and write your thoughts well, because some olympiad problems are complex and requires putting together more than one lemma or idea together to solve. For this to work, you need to have the skill of putting together a lot of moving parts into a single coherent argument. Bonus points here if your audience is someone you care about (as opposed to a grader), because then you have to also worry about making the presentation as clean and natural as possible.
These days, whenever I solve a problem I always take the time to write it up cleanly, because in the process of doing so I nearly always find ways that the solution can be made shorter or more elegant, or at least philosophically more natural. (I also often find my solution is wrong.) So it seems that the write-up process here is not merely about presenting the same math in different ways: the underlying math really does change. 
- Thinking about how to learn. For example, the Art of Problem Solving forums are often filled with questions of the form “what should I do?”. Many older users find these questions obnoxious, but I find them desirable. I think being able to spend time pondering about what makes people improve or learn well is a good trait to develop, rather than mindlessly doing one book after another.
Of course, many of the questions I referred to are poor, either with no real specific direction: often the questions are essentially “what book should I read?”, or “give me a exhaustive list of everything I should know”. But I think this is inevitable because these are people’s first attempts at understanding contest training. Just like the first difficult math contest you take often goes quite badly, the first time you try to think about learning, you will probably ask questions you will be embarrassed about in five years. My hope is that as these younger users get older and wiser, the questions and thoughts become mature as well. To this end I do not mind seeing people wobble on their first steps.
- Being honest with your own understanding, particularly of fundamentals. When watching experienced contestants, you often see people solving problems using advanced techniques like Brianchon’s theorem or the n-1 equal value principle or whatever. It’s tempting to think that if you learn the names and statements of all these advanced techniques then you’ll be able to apply them too. But the reality is that these techniques are advanced for a reason: they are hard to use without mastery of fundamentals.
This is something I definitely struggled with as a contestant: being forced to patiently learn all the fundamentals and not worry about the fancy stuff. To give an example, the 2011 JMO featured an inequality which was routine for experienced or well-trained contestants, but “almost impossible for people who either have not seen inequalities at all or just like to compile famous names in their proofs”. I was in the latter category, and tried to make up a solution using multivariable Jensen, whatever that meant. Only when I was older did I really understand what I was missing.
- Dual to the above, once you begin to master something completely you start to learn what different depths of understanding feel like, and an appreciation for just how much effort goes into developing a mastery of something.
- Being able to think about things which are not well-defined. This one often comes as a surprise to people, since math is a field which is known for its precision. But I still maintain that this a skill contests train for.
A very simple example is a question like, “when should I use the probabilistic method?”. Yes, we know it’s good for existence questions, but can we say anything more about when we expect it to work? Well, one heuristic (not the only one) is “if a monkey could find it” — the idea that a randomly selected object “should” work. But obviously something like this can’t be subject to a (useful) formal definition that works 100% of the time, and there are plenty of contexts in which even informally this heuristic gives the wrong answer. So that’s an example of a vague and nebulous concept that’s nonetheless necessary in order to understanding the probabilistic method well.
There are much more general examples one can say. What does it mean for a problem to “feel projective”? I can’t tell you a hard set of rules; you’ll have to do a bunch of examples and gain the intuition yourself. Why do I say this problem is “rigid”? Same answer. How do you tell which parts of this problem are natural, and which are artificial? How do you react if you have the feeling the problem gives you nothing to work with? How can you tell if you are making progress on a problem? Trying to figure out partial answers to these questions, even if they can’t be put in words, will go a long way in improving the mythical intuition that everyone knows is so important.
It might not be unreasonable to say that by this point we are studying philosophy, and that’s exactly what I intend. When I teach now I often make a point of referring to the “morally correct” way of thinking about things, or making a point of explaining why X should be true, rather than just providing a proof. I find this type of philosophy interesting in its own right, but that is not the main reason I incorporate it into my teaching. I teach the philosophy now because it is necessary, because you will solve fewer problems without that understanding.
4. I think if you don’t do well, it’s better to you
But I think the most surprising benefit of math contests is that most participants won’t win. In high school everyone tells you that if you work hard you will succeed. The USAMO is a fantastic counterexample to this. Every year, there are exactly 12 winners on the USAMO. I can promise you there are far more than 12 people who work very hard every year with the hope of doing well on the USAMO. Some people think this is discouraging, but I find it desirable.
Let me tell you a story.
Back in September of 2015, I sneaked in to the parent’s talk at Math Prize for Girls, because Zuming Feng was speaking and I wanted to hear what he had to say. (The whole talk was is available on YouTube now.) The talk had a lot of different parts that I liked, but one of them struck me in particular, when he recounted something he said to one of his top students:
I really want you to work hard, but I really think if you don’t do well, if you fail, it’s better to you.
I had a hard time relating to this when I first heard it, but it makes sense if you think about it. What I’ve tried to argue is that the benefit of math contests is not that the contestant can now solve N problems on USAMO in late April, but what you gain from the entire year of practice. And so if you hold the other 363 days fixed, and then vary only the final outcome of the USAMO, which of success and failure is going to help a contestant develop more as a person?
For that reason I really like to think that the final lesson from high school olympiads is how to appreciate the entire journey, even in spite of the eventual outcome.
- I actually think this is one of the good arguments in favor of the new JMO/USAMO system introduced in 2010. Before this, it was not uncommon for participants in 9th and 10th grade to really only aim for solving one or two entry-level USAMO problems to qualify for MOP. To this end I think the mentality of “the cutoff will probably only be X, so give up on solving problem six” is sub-optimal.
- That’s a Zuming quote.
- Which is why I think the HMIC is actually sort of pointless from a contestant’s perspective, but it’s good logistics training for the tournament directors.
- I could be wrong about people thinking chess is a good experience, given that I don’t actually have any serious chess experience beyond knowing how the pieces move. A cursory scan of the Internet suggests otherwise (was surprised to find that Ben Franklin has an opinion on this) but it’s possible there are people who think chess is a waste of time, and are merely not as vocal as the people who think math contests are a waste of time.
- Relative to what many high school students work on, not compared to research or something.
- Privately, I think that working in math olympiads taught me way more about writing well than English class ever did; English class always felt to me like the skill of trying to sound like I was saying something substantial, even when I wasn’t.
I recently had a combinatorics paper appear in the EJC. In this post I want to brag a bit by telling the “story” of this paper: what motivated it, how I found the conjecture that I originally did, and the process that eventually led me to the proof, and so on.
This work was part of the Duluth REU 2017, and I thank Joe Gallian for suggesting the problem.
Let me begin by formulating the problem as it was given to me. First, here is the definition and notation for a “block-ascending” permutation.
It turns out that block-ascending permutations which also avoid an increasing subsequence of certain length have nice enumerative properties. To this end, we define the following notation.
(The reason for using will be explained later.) In particular, if .
Now on to the results. A 2011 paper by Joel Brewster Lewis (JBL) proved (among other things) the following result:
Just before the Duluth REU in 2017, Mei and Wang proved that in fact, in Lewis’ result one may freely mix and ‘s. To simplify notation,
The proof uses the RSK correspondence, but the authors posed at the end of the paper the following open problem:
This was the first problem that I was asked to work on. (I remember I received the problem on Sunday morning; this actually matters a bit for the narrative later.)
At this point I should pause to mention that this notation is my own invention, and did not exist when I originally started working on the problem. Indeed, all the results are restricted to the case where for each , and so it was unnecessary to think about other possibilities for : Mei and Wang’s paper use the notation . So while I’ll continue to use the notation in the blog post for readability, it will make some of the steps more obvious than they actually were.
2. Setting out
Mei and Wang’s paper originally suggested that rather than finding a bijection for any and , it would suffice to biject
and then compose two such bijections. I didn’t see why this should be much easier, but it didn’t seem to hurt either.
As an example, they show how to do this bijection with and . Indeed, suppose . Then is an increasing sequence of length right at the start of . So had better be the largest element in the permutation: otherwise later in the biggest element would complete an ascending permutation of length already! So removing gives a bijection between .
But if you look carefully, this proof does essentially nothing with the later blocks. The exact same proof gives:
Once I found this proposition I rejected the initial suggestion of specializing . The “easy case” I had found told me that I could take a set and delete the single element from it. So empirically, my intuition from this toy example told me that it would be easier to find bijections whee and were only “a little different”, and hope that the resulting bijection only changed things a little bit (in the same way that in the toy example, all the bijection did was delete one element). So I shifted to trying to find small changes of this form.
3. The fork in the road
3.1. Wishful thinking
I had a lucky break of wishful thinking here. In the notation with , I had found that one could replace with either or freely. (But this proof relied heavily on the fact the block really being on the far left.) So what other changes might I be able to make?
There were two immediate possibilities that came to my mind.
- Deletion: We already showed could be changed from to for any . If we can do a similar deletion with for any , not just , then we would be done.
- Swapping: If we can show that two adjacent ‘s could be swapped, that would be sufficient as well. (It’s also possible to swap non-adjacent ‘s, but that would cause more disruption for no extra benefit.)
Now, I had two paths that both seemed plausible to chase after. How was I supposed to know which one to pick? (Of course, it’s possible neither work, but you have to start somewhere.)
Well, maybe the correct thing to do would have to just try both. But it was Sunday afternoon by the time I got to this point. Granted, it was summer already, but I knew that come Monday I would have doctor appointments and other trivial errands to distract me, so I decided I should pick one of them and throw the rest of the day into it. But that meant I had to pick one.
(I confess that I actually already had a prior guess: the deletion approach seemed less likely to work than the swapping approach. In the deletion approach, if is somewhere in the middle of the permutation, it seemed like deleting an element could cause a lot of disruption. But the swapping approach preserved the total number of elements involved, and so seemed more likely that I could preserve structure. But really I was just grasping at straws.)
3.2. Enter C++
Yeah, I cheated. Sorry.
Those of you that know anything about my style of math know that I am an algebraist by nature — sort of. It’s more accurate to say that I depend on having concrete examples to function. True, I can’t do complexity theory for my life, but I also haven’t been able to get the hang of algebraic geometry, despite having tried to learn it three or four times by now. But enumerative combinatorics? OH LOOK EXAMPLES.
Here’s the plan: let . Then using a C++ computer program:
- Enumerate all the permutations in .
- Enumerate all the permutations in .
- Enumerate all the permutations in .
If the deletion approach is right, then I would hope and look pretty similar. On the flip side, if the swapping approach is right, then and should look close to each other instead.
It’s moments like this where my style of math really shines. I don’t have to make decisions like the above off gut-feeling: do the “data science” instead.
3.3. A twist of fate
Except this isn’t actually what I did, since there was one problem. Computing the longest increasing subsequence of a length permutation takes time, and there are or so permutations. But when , we have , which is a pretty big number. Unfortunately, my computer is not really that fast, and I didn’t really have the patience to implement the “correct” algorithms to bring the runtime down.
The solution? Use instead.
In a deep irony that I didn’t realize at the time, it was this moment when I introduced the notation, and for the first time allowed the to not be in . My reasoning was that since I was only doing this for heuristic reasons, I could instead work with and probably not change much about the structure of the problem, while replacing , which would run times faster. This was okay since all I wanted to do was see how much changing the “middle” would disrupt the structure.
And so the new plan was:
- Enumerate all the permutations in .
- Enumerate all the permutations in .
- Enumerate all the permutations in .
I admit I never actually ran the enumeration with , because the route with and turned out to be even more promising than I expected. When I compared the empirical data for the sets and , I found that the number of permutations with any particular triple were equal. In other words, the outer blocks were preserved: the bijection
does not tamper with the outside blocks of length and .
This meant I was ready to make the following conjecture. Suppose , . There is a bijection
which only involves rearranging the elements of the th and st blocks.
4. Rooting out the bijection
At this point I was in a quite good position. I had pinned down the problem to a finding a particular bijection that I was confident had to exist, since it was showing up to the empirical detail.
Let’s call this mythical bijection . How could I figure out what it was?
4.1. Hunch: preserves order-isomorphism
Let me quickly introduce a definition.
Now I guessed one more property of : this should order-isomorphism.
What do I mean by this? Suppose in one context changed to ; then we would expect that in another situation we should have changing to . Indeed, we expect (empirically) to not touch surrounding outside blocks, and so it would be very strange if behaved differently due to far-away numbers it wasn’t even touching.
So actually I’ll just write
for this example, reducing the words in question.
4.2. Keep cheating
With this hunch it’s possible to cheat with C++ again. Here’s how.
Let’s for concreteness suppose and the particular sets
Well, it turns out if you look at the data:
- The only element of which starts with and ends with is .
- The only element of which starts with and ends with is .
So that means that is changed to . Thus the empirical data shows that
In general, it might not be that clear cut. For example, if we look at the permutations starting with and , there is more than one.
- and are both in .
- and are both in in .
but we can’t tell which one goes to which (although you might be able to guess).
Fortunately, there is lots of data. This example narrowed down to two values, but if you look at other places you might have different data on . Since we think is behaving the same “globally”, we can piece together different pieces of data to get narrower sets. Even better, is a bijection, so once we match either of or , we’ve matched the other.
You know what this sounds like? Perfect matchings.
So here’s the experimental procedure.
- Enumerate all permutations in and .
- Take each possible tuple , and look at the permutations that start and end with those particular four elements. Record the reductions of and for all these permutations. We call these input words and output words, respectively. Each output word is a “candidate” of for a input word.
- For each input word that appeared, take the intersection of all output words that appeared. This gives a bipartite graph , with input words being matched to their candidates.
- Find perfect matchings of the graph.
And with any luck that would tell us what is.
Luckily, the bipartite graph is quite sparse, and there was only one perfect matching.
246|1357 => 2467|135 247|1356 => 2457|136 256|1347 => 2567|134 257|1346 => 2357|146 267|1345 => 2367|145 346|1257 => 3467|125 347|1256 => 3457|126 356|1247 => 3567|124 357|1246 => 1357|246 367|1245 => 1367|245 456|1237 => 4567|123 457|1236 => 1457|236 467|1235 => 1467|235 567|1234 => 1567|234
If you look at the data, well, there are some clear patterns. Exactly one number is “moving” over from the right half, each time. Also, if is on the right half, then it always moves over.
Anyways, if you stare at this for an hour, you can actually figure out the exact rule:
And indeed, once I have this bijection, it takes maybe only another hour of thinking to verify that this bijection works as advertised, thus solving the original problem.
Rather than writing up what I had found, I celebrated that Sunday evening by playing Wesnoth for 2.5 hours.
On Monday morning I was mindlessly feeding inputs to the program I had worked on earlier and finally noticed that in fact and also had the same cardinality. Huh.
It seemed too good to be true, but I played around some more, and sure enough, the cardinality of seemed to only depend on the order of the ‘s. And so at last I stumbled upon the final form the conjecture, realizing that all along the assumption that I had been working with was a red herring, and that the bijection was really true in much vaster generality. There is a bijection
which only involves rearranging the elements of the th and st blocks.
It also meant I had more work to do, and so I was now glad that I hadn’t written up my work from yesterday night.
5.2. More data science
I re-ran the experiment I had done before, now with . (This was interesting, because the elements in question could now have either longest increasing subsequence of length , or instead of length .)
The data I obtained was:
246|13578 => 24678|135 247|13568 => 24578|136 248|13567 => 24568|137 256|13478 => 25678|134 257|13468 => 23578|146 258|13467 => 23568|147 267|13458 => 23678|145 268|13457 => 23468|157 278|13456 => 23478|156 346|12578 => 34678|125 347|12568 => 34578|126 348|12567 => 34568|127 356|12478 => 35678|124 357|12468 => 13578|246 358|12467 => 13568|247 367|12458 => 13678|245 368|12457 => 13468|257 378|12456 => 13478|256 456|12378 => 45678|123 457|12368 => 14578|236 458|12367 => 14568|237 467|12358 => 14678|235 468|12357 => 12468|357 478|12356 => 12478|356 567|12348 => 15678|234 568|12347 => 12568|347 578|12346 => 12578|346 678|12345 => 12678|345
Okay, so it looks like:
- exactly two numbers are moving each time, and
- the length of the longest run is preserved.
Eventually, I was able to work out the details, but they’re more involved than I want to reproduce here. But the idea is that you can move elements “one at a time”: something like
while preserving the length of increasing subsequences at each step.
So, together with the easy observation from the beginning, this not only resolves the original problem, but also gives an elegant generalization. I had now proved:
6. Discovered vs invented
Whenever I look back on this, I can’t help thinking just how incredibly lucky I got on this project.
There’s this perpetual debate about whether mathematics is discovered or invented. I think it’s results like this which make the case for “discovered”. I did not really construct the bijection myself: it was “already there” and I found it by examining the data. In another world where did not exist, all the creativity in the world wouldn’t have changed anything.
So anyways, that’s the behind-the-scenes tour of my favorite combinatorics paper.
I wanted to quickly write this proof up, complete with pictures, so that I won’t forget it again. In this post I’ll give a combinatorial proof (due to Joyal) of the following:
Proof: We are going to construct a bijection between
- Functions (of which there are ) and
- Trees on with two distinguished nodes and (possibly ).
This will imply the answer.
Let’s look at the first piece of data. We can visualize it as points floating around, each with an arrow going out of it pointing to another point, but possibly with many other arrows coming into it. Such a structure is apparently called a directed pseudoforest. Here is an example when .
You’ll notice that in each component, some of the points lie in a cycle and others do not. I’ve colored the former type of points blue, and the corresponding arrows magenta.
Thus a directed pseudoforest can also be specified by
- a choice of some vertices to be in cycles (blue vertices),
- a permutation on the blue vertices (magenta arrows), and
- attachments of trees to the blue vertices (grey vertices and arrows).
Now suppose we take the same information, but replace the permutation on the blue vertices with a total ordering instead (of course there are an equal number of these). Then we can string the blue vertices together as shown below, where the green arrows denote the selected total ordering (in this case ):
This is exactly the data of a tree on the vertices with two distinguished vertices, the first and last in the chain of green (which could possibly coincide).
I’m reading through Primes of the Form , by David Cox (link; it’s good!). Here are the high-level notes I took on the first chapter, which is about the theory of quadratic forms.
(Meta point re blog: I’m probably going to start posting more and more of these more high-level notes/sketches on this blog on topics that I’ve been just learning. Up til now I’ve been mostly only posting things that I understand well and for which I have a very polished exposition. But the perfect is the enemy of the good here; given that I’m taking these notes for my own sake, I may as well share them to help others.)
For example, we have the famous quadratic form
As readers are probably aware, we can say a lot about exactly which integers can be represented by : by Fermat’s Christmas theorem, the primes (and ) can all be written as the sum of two squares, while the primes cannot. For convenience, let us say that:
The basic question is: what can we say about which primes/integers are properly represented by a quadratic form? In fact, we will later restrict our attention to “positive definite” forms (described later).
For example, Fermat’s Christmas theorem now rewrites as:
The proof of this is classical, see for example my olympiad handout. We also have the formulation for odd integers:
Proof: For the “if” direction, we use the fact that is multiplicative in the sense that
For the “only if” part we use the fact that if a multiple of a prime is properly represented by , then so is . This follows by noticing that if (and ) then .
Tangential remark: the two ideas in the proof will grow up in the following way.
- The fact that “multiplies nicely” will grow up to become the so-called composition of quadratic forms.
- The second fact will not generalize for an arbitrary form . Instead, we will see that if a multiple of is represented by a form then some form of the same “discriminant” will represent the prime , but this form need not be the same as itself.
2. Equivalence of forms, and the discriminant
The first thing we should do is figure out when two forms are essentially the same: for example, and should clearly be considered the same. More generally, if we think of as acting on and is any automorphism of , then should be considered the same as . Specifically,
So we generally will only care about forms up to proper equivalence. (It will be useful to distinguish between proper/improper equivalence later.)
Naturally we seek some invariants under this operation. By far the most important is:
The discriminant is invariant under equivalence (check this). Note also that we also have .
Observe that we have
So if and (thus too) then for all . Such quadratic forms are called positive definite, and we will restrict our attention to these forms.
Now that we have this invariant, we may as well classify equivalence classes of quadratic forms for a fixed discriminant. It turns out this can be done explicitly.
Then the big huge theorem is:
Proof: Omitted due to length, but completely elementary. It is a reduction argument with some number of cases.
Thus, for any discriminant we can consider the set
which will be the equivalence classes of positive definite of discriminant . By abuse of notation we will also consider it as the set of equivalence classes of primitive positive definite forms of discriminant .
We also define ; by the exercise, . This is called the class number.
Moreover, we have , because we can take for and for . We call this form the principal form.
3. Tables of quadratic forms
4. The Character
We can now connect this to primes as follows. Earlier we played with , and observed that for odd primes , if and only if some multiple of is properly represented by .
Our generalization is as follows:
This generalizes our result for , but note that it uses in an essential way! That is: if , we know is represented by some quadratic form of discriminant \dots but only since do we know that this form reduces to .
Proof: First assume WLOG that and . Thus , since otherwise this would imply . Then
The converse direction is amusing: let for integers , . Consider the quadratic form
It is primitive of discriminant and . Now may not be reduced, but that’s fine: just take the reduction of , which must also properly represent .
Thus to every discriminant we can attach the Legendre character (is that the name?), which is a homomorphism
with the property that if is a rational prime not dividing , then . This is abuse of notation since I should technically write , but there is no harm done: one can check by quadratic reciprocity that if then . Thus our previous result becomes:
As a corollary of this, using the fact that one can prove that
Proof: The congruence conditions are equivalent to , and as before the only point is that the only reduced quadratic form for these is the principal one.
5. Genus theory
What if ? Sometimes, we can still figure out which primes go where just by taking mods.
Let . Then it represents some residue classes of . In that case we call the set of residue classes represented the genus of the quadratic form .
The thing that makes this work is that each genus appears exactly once. We are not always so lucky: for example when we have that
We now prove that:
Proof: For the first part, we aim to show is multiplicatively closed. For , we use the fact that
For , we instead appeal to another “magic” identity
and it follows from here that is actually the set of squares in , which is obviously a subgroup.
Now we show that other quadratic forms have genus equal to a coset of the principal genus. For , with we can write
and thus the desired coset is shown to be . As for , we have
so the desired coset is also , since was the set of squares.
Thus every genus is a coset of in . Thus:
Thus there is a natural map
(The map is surjective by Theorem~14.) We also remark than is quite well-behaved:
Proof: Observe that contains all the squares of : if is the principal form then . Thus claim each element of has order at most , which implies the result since is a finite abelian group.
In fact, one can compute the order of exactly, but for this post I Will just state the result.
We have already used once the nice identity
We are going to try and generalize this for any two quadratic forms in . Specifically,
In fact, without the latter two constraints we would instead have and , and each choice of signs would yield one of four (possibly different) forms. So requiring both signs to be positive makes this operation well-defined. (This is why we like proper equivalence; it gives us a well-defined group structure, whereas with improper equivalence it would be impossible to put a group structure on the forms above.)
Taking this for granted, we then have that
We then have a group homomorphism
Observe that and are inverses and that their images coincide (being improperly equivalent); this is expressed in the fact that has elements of order . As another corollary, the number of elements of with a given genus is always a power of two.
We now define:
Thus we arrive at the following corollary:
Hence the represent-ability depends only on .
OEIS A000926 lists 65 convenient numbers. This sequence is known to be complete except for at most one more number; moreover the list is complete assuming the Grand Riemann Hypothesis.
7. Cubic and quartic reciprocity
To treat the cases where is not convenient, the correct thing to do is develop class field theory. However, we can still make a little bit more progress if we bring higher reciprocity theorems to bear: we’ll handle the cases and , two examples of numbers which are not convenient.
7.1. Cubic reciprocity
First, we prove that
To do this we use cubic reciprocity, which requires working in the Eisenstein integers where is a cube root of unity. There are six units in (the sixth roots of unity), hence each nonzero number has six associates (differing by a unit), and the ring is in fact a PID.
Now if we let be a prime not dividing , and is coprime to , then we can define the cubic Legendre symbol by setting
Moreover, we can define a primary prime to be one such that ; given any prime exactly one of the six associates is primary. We then have the following reciprocity theorem:
The first supplementary law is for the unit (analogous to ) while the second reciprocity law handles the prime divisors of (analogous to .)
We can tie this back into as follows. If is a rational prime then it is represented by , and thus we can put for some prime , . Consequently, we have a natural isomorphism
Therefore, we see that a given is a cubic residue if and only if .
In particular, we have the following corollary, which is all we will need:
Proof: By cubic reciprocity:
Now we give the proof of Theorem~27. Proof: First assume
Let be primary, noting that . Now clearly , so done by corollary.
For the converse, assume , with primary and . If we set for integers and , then the fact that and is enough to imply that (check it!). Moreover,
7.2. Quartic reciprocity
This time we work in , for which there are four units , . A prime is primary if ; every prime not dividing has a unique associate which is primary. Then we can as before define
where is primary, and is nonzero mod . As before , we have that is a quartic residue modulo if and only if thanks to the isomorphism
Now we have
Again, the first law handles units, and the second law handles the prime divisors of . The corollary we care about this time in fact uses only the supplemental laws:
Proof: Note that and applying the above. Therefore
Now we assumed is primary. We claim that
Note that since was is divisible by , hence divides . Thus
since is odd and is even. Finally,
From here we quickly deduce