Pictures, thoughts, and other festives from the 2019 Romania Masters in Math. See also the MAA press release. Summary Po-Shen Loh and I spent the last week in Bucharest with the United States team for the 11th RMM. The USA … Continue reading
Careful readers of my blog might have heard about plans to have a second edition of Napkin out by the end of February. As it turns out I was overly ambitious, and (seeing that I am spending the next week … Continue reading
There’s a recent working paper by economists Ruchir Agarwal and Patrick Gaule which I think would be of much interest to this readership: a systematic study of IMO performance versus success as a mathematician later on.
Despite the click-baity title and dreamy introduction about the Millenium Prizes, the rest of the paper is fascinating, and the figures section is a gold mine. Here are two that stood out to me:
There’s also one really nice idea they had, which was to investigate the effect of getting one point less than a gold medal, versus getting exactly a gold medal. This is a pretty clever way to account for the effect of the prestige of the IMO, since “IMO gold” sounds so much better on a CV than “IMO silver” even though in any given year they may not differ so much. To my surprise, the authors found that “being awarded a better medal appears to have no additional impact on becoming a professional mathematician or future knowledge production”. I included the relevant graph below here.
The data used in the paper spans from IMO 1981 to IMO 2000. This is before the rise of Art of Problem Solving and the Internet (and the IMO was smaller back then, anyways), so I imagine these graphs might look different if we did them in 2040 using IMO 2000 – IMO 2020 data, although I’m not even sure whether I expect the effects to be larger or smaller.
(As usual: I do not mean to suggest that non-IMO participants cannot do well in math later. This is so that I do not get flooded with angry messages like last time.)
In the previous post we defined -adic numbers. This post will state (mostly without proof) some more surprising results about continuous functions . Then we give the famous proof of the Skolem-Mahler-Lech theorem using -adic analysis.
1. Digression on
Before I go on, I want to mention that is not algebraically closed. So, we can take its algebraic closure — but this field is now no longer complete (in the topological sense). However, we can then take the completion of this space to obtain . In general, completing an algebraically closed field remains algebraically closed, and so there is a larger space which is algebraically closed and complete. This space is called the -adic complex numbers.
We won’t need at all in what follows, so you can forget everything you just read.
2. Mahler coefficients: a description of continuous functions on
One of the big surprises of -adic analysis is that we can concretely describe all continuous functions . They are given by a basis of functions
in the following way.
The are called the Mahler coefficients of .
You’ll note that these are the same finite differences that one uses on polynomials in high school math contests, which is why they are also called “Mahler differences”.
Thus one can think of as saying that the values of , , \dots behave like a polynomial modulo for every . Amusingly, this fact was used on a USA TST in 2011:
3. Analytic functions
We say that a function is analytic if it has a power series expansion
As before there is a characterization in terms of the Mahler coefficients:
Just as holomorphic functions have finitely many zeros, we have the following result on analytic functions on .
We close off with an application of the analyticity results above.
Proof: According to the theory of linear recurrences, there exists a matrix such that we can write as a dot product
Let be a prime not dividing . Let be an integer such that .
Fix any . We will prove that either all the terms
are zero, or at most finitely many of them are. This will conclude the proof.
Let for some integer matrix . We have
Thus we have written in Mahler form. Initially, we define , but by Mahler’s theorem (since ) it follows that extends to a function . Also, we can check that hence is even analytic.
Thus by Strassman’s theorem, is either identically zero, or else it has finitely many zeros, as desired.
I think this post is more than two years late in coming, but anywhow…
This post introduces the -adic integers , and the -adic numbers . The one-sentence description is that these are “integers/rationals carrying full mod information” (and only that information).
The first four sections will cover the founding definitions culminating in a short solution to a USA TST problem.
In this whole post, is always a prime. Much of this is based off of Chapter 3A from Straight from the Book.
Before really telling you what and are, let me tell you what you might expect them to do.
In elementary/olympiad number theory, we’re already well-familiar with the following two ideas:
- Taking modulo a prime or prime , and
- Looking at the exponent .
Let me expand on the first point. Suppose we have some Diophantine equation. In olympiad contexts, one can take an equation modulo to gain something else to work with. Unfortunately, taking modulo loses some information: (the reduction is far from injective).
If we want finer control, we could consider instead taking modulo , rather than taking modulo . This can also give some new information (cubes modulo , anyone?), but it has the disadvantage that isn’t a field, so we lose a lot of the nice algebraic properties that we got if we take modulo .
One of the goals of -adic numbers is that we can get around these two issues I described. The -adic numbers we introduce is going to have the following properties:
- You can “take modulo for all at once”. In olympiad contexts, we are used to picking a particular modulus and then seeing what happens if we take that modulus. But with -adic numbers, we won’t have to make that choice. An equation of -adic numbers carries enough information to take modulo .
- The numbers form a field, the nicest possible algebraic structure: makes sense. Contrast this with , which is not even an integral domain.
- It doesn’t lose as much information as taking modulo does: rather than the surjective we have an injective map .
- Despite this, you “ignore” some “irrelevant” data. Just like taking modulo , you want to zoom-in on a particular type of algebraic information, and this means necessarily losing sight of other things. (To draw an analogy: the equation has no integer solutions, because, well, squares are nonnegative. But you will find that this equation has solutions modulo any prime , because once you take modulo you stop being able to talk about numbers being nonnegative. The same thing will happen if we work in -adics: the above equation has a solution in for every prime .)
So, you can think of -adic numbers as the right tool to use if you only really care about modulo information, but normal isn’t quite powerful enough.
To be more concrete, I’ll give a poster example now:
Here is a problem where we clearly only care about -type information. Yet it’s a nontrivial challenge to do the necessary manipulations mod (try it!). The basic issue is that there is no good way to deal with the denominators modulo (in part is not even an integral domain).
However, with -adic analysis we’re going to be able to overcome these limitations and give a “straightforward” proof by using the identity
Such an identity makes no sense over or for converge reasons, but it will work fine over the , which is all we need.
2. Algebraic perspective
We now construct and . I promised earlier that a -adic integer will let you look at “all residues modulo ” at once. This definition will formalize this.
2.1. Definition of
In this way we get an injective map
which is not surjective. So there are more -adic integers than usual integers.
(Remark for experts: those of you familiar with category theory might recognize that this definition can be written concisely as
where the inverse limit is taken across .)
2.2. Base expansion
Here is another way to think about -adic integers using “base ”. As in the example earlier, every usual integer can be written in base , for example
More generally, given any , we can write down a “base ” expansion in the sense that there are exactly choices of given . Continuing the example earlier, we would write
and in general we can write
where , such that the equation holds modulo for each . Note the expansion is infinite to the left, which is different from what you’re used to.
(Amusingly, negative integers also have infinite base expansions: , corresponding to .)
Thus you may often hear the advertisement that a -adic integer is an “possibly infinite base expansion”. This is correct, but later on we’ll be thinking of in a more and more “analytic” way, and so I prefer to think of this as a “Taylor series with base ”. Indeed, much of your intuition from generating functions (where is a field) will carry over to .
Here is one way in which your intuition from generating functions carries over:
Contrast this with the corresponding statement for : a generating function is invertible iff .
Proof: If then , so clearly not invertible. Otherwise, for all , so we can take an inverse modulo , with . As the are themselves compatible, the element is an inverse.
With this observation, here is now the definition of .
Continuing our generating functions analogy:
This means is “Laurent series with base ”, and in particular according to the earlier proposition we deduce:
Thus, continuing our base analogy, elements of are in bijection with “Laurent series”
for . So the base representations of elements of can be thought of as the same as usual, but extending infinitely far to the left (rather than to the right).
(Fair warning: the field has characteristic zero, not .)
(At this point I want to make a remark about the fact , connecting it to the wish-list of properties I had before. In elementary number theory you can take equations modulo , but if you do the quantity doesn’t make sense unless you know . You can’t fix this by just taking modulo since then you need to get , ad infinitum. You can work around issues like this, but the nice feature of and is that you have modulo information for “all at once”: the information of packages all the modulo information simultaneously. So you can divide by with no repercussions.)
3. Analytic perspective
Up until now we’ve been thinking about things mostly algebraically, but moving forward it will be helpful to start using the language of analysis. Usually, two real numbers are considered “close” if they are close on the number of line, but for -adic purposes we only care about modulo information. So, we’ll instead think of two elements of or as “close” if they differ by a large multiple of .
For this we’ll borrow the familiar from elementary number theory.
This fulfills the promise that and are close if they look the same modulo for large ; in that case is large and accordingly is small.
3.2. Ultrametric space
In this way, and becomes a metric space with metric given by .
In fact, these spaces satisfy a stronger form of the triangle inequality than you are used to from .
However, is more than just a metric space: it is a field, with its own addition and multiplication. This means we can do analysis just like in or : basically, any notion such as “continuous function”, “convergent series”, et cetera has a -adic analog. In particular, we can define what it means for an infinite sum to converge:
With this definition in place, the “base ” discussion we had earlier is now true in the analytic sense: if then
Indeed, the th partial sum is divisible by , hence the partial sums approach as .
While the definitions are all the same, there are some changes in properties that should be true. For example, in convergence of partial sums is simpler:
Contrast this with in . You can think of this as a consequence of strong triangle inequality. Proof: By multiplying by a large enough power of , we may assume . (This isn’t actually necessary, but makes the notation nicer.)
Observe that must eventually stabilize, since for large enough we have . So let be the eventual residue modulo of for large . In the same way let be the eventual residue modulo , and so on. Then one can check we approach the limit .
Here’s a couple exercises to get you used to thinking of and as metric spaces.
3.3. More fun with geometric series
While we’re at it, let’s finally state the -adic analog of the geometric series formula.
Proof: Note that the partial sums satisfy , and as since .
So, is really a correct convergence in . And so on.
If you buy the analogy that is generating functions with base , then all the olympiad generating functions you might be used to have -adic analogs. For example, you can prove more generally that:
(I haven’t defined , but it has the properties you expect.) The proof is as in the real case; even the theorem statement is the same except for the change for the extra subscript of . I won’t elaborate too much on this now, since -adic exponentiation will be described in much more detail in the next post.
Note that the definition of could have been given for as well; we didn’t need to introduce it (after all, we have in olympiads already). The big important theorem I must state now is:
This is the definition of you’ll see more frequently; one then defines in terms of (rather than vice-versa) according to
Let me justify why this definition is philosophically nice.
Suppose you are a numerical analyst and you want to estimate the value of the sum
to within . The sum consists entirely of rational numbers, so the problem statement would be fair game for ancient Greece. But it turns out that in order to get a good estimate, it really helps if you know about the real numbers: because then you can construct the infinite series , and deduce that , up to some small error term from the terms past , which can be bounded.
Of course, in order to have access to enough theory to prove that , you need to have the real numbers; it’s impossible to do serious analysis in the non-complete space , where e.g. the sequence , , , , \dots is considered “not convergent” because . Instead, all analysis is done in the completion of , namely .
Now suppose you are an olympiad contestant and want to estimate the sum
to within mod (i.e. to within in ). Even though is a rational number, it still helps to be able to do analysis with infinite sums, and then bound the error term (i.e. take mod ). But the space is not complete with respect to either, and thus it makes sense to work in the completion of with respect to . This is exactly .
4. Solving USA TST 2002/2
Let’s finally solve Example~1, which asks to compute
Armed with the generalized binomial theorem, this becomes straightforward.
Using the elementary facts that and , this solves the problem.
Some of the stuff covered in this handout:
- Advice for constructing the triangle centers (hint: circumcenter goes first)
- An example of how to rearrange the conditions of a problem and draw a diagram out-of-order
- Some mechanical suggestions such as dealing with phantom points
- Some examples of computer-generated figures
One of the major headaches of using complex numbers in olympiad geometry problems is dealing with square roots. In particular, it is nontrivial to express the incenter of a triangle inscribed in the unit circle in terms of its vertices.
The following lemma is the standard way to set up the arc midpoints of a triangle. It appears for example as part (a) of Lemma 6.23.
Theorem 1 is often used in combination with the following lemma, which lets one assign the incenter the coordinates in the above notation.
Unfortunately, the proof of Theorem 1 in my textbook is wrong, and I cannot find a proof online (though I hear that Lemmas in Olympiad Geometry has a proof). So in this post I will give a correct proof of Theorem 1, which will hopefully also explain the mysterious introduction of the minus signs in the theorem statement. In addition I will give a version of the theorem valid for quadrilaterals.
2. A Word of Warning
I should at once warn the reader that Theorem 1 is an existence result, and thus must be applied carefully.
To see why this matters, consider the following problem, which appeared as problem 1 of the 2016 JMO.
By experimenting with the diagram, it is not hard to guess that the correct fixed point is the midpoint of arc , as seen in the figure below. One might be tempted to write , , , and assert the two incenters are and , and that the fixed point is .
This is a mistake! If one applies Theorem 1 twice, then the choices of “square roots” of the common vertices and may not be compatible. In fact, they cannot be compatible, because the arc midpoint of opposite is different from the arc midpoint of opposite .
In fact, I claim this is not a minor issue that one can work around. This is because the claim that the circumcircle of passes through the midpoint of arc is false if lies on the arc on the same side as ! In that case it actually passes through instead. Thus the truth of the problem really depends on the fact that the quadrilateral is convex, and any attempt with complex numbers must take this into account to have a chance of working.
3. Proof of the theorem for triangles
Fix now, so we require , , . There are choices of square roots , , we can take (differing by a sign); we wish to show one of them works.
We pick an arbitrary choice for first. Then, of the two choices of , we pick the one such that . Similarly, for the two choices of , we pick the one such that . Our goal is to show that under these conditions, we have again.
The main trick is to now consider the arc midpoint , which we denote by . It is easy to see that:
Thus, we have
From this we can see why the minus signs are necessary.
4. A version for quadrilaterals
We now return to the setting of a convex quadrilateral that we encountered in Example 3. Suppose we preserve the variables , , that we were given from Theorem 1, but now add a fourth complex number with . How are the new arc midpoints determined? The following theorem answers this question.
This setup is summarized in the following figure.
Note that unlike Theorem 1, the four arcs cut out by the sides of do not all have the same sign (I chose to have coordinates ). This asymmetry is inevitable (see if you can understand why from the proof below).
Proof: We select , , with Theorem 1. Now, pick a choice of such that is the arc midpoint of not containing and . Then the arc midpoint of not containing or is given by
On the other hand, the calculation of for the midpoint of follows by applying Lemma 4 again. (applied to triangle ). The midpoint of is computed similarly.
In other problems, the four vertices of the quadrilateral may play more symmetric roles and in that case it may be desirable to pick a setup in which the four vertices are labeled in order. By relabeling the letters in Theorem 6 one can prove the following alternate formulation.
To test the newfound theorem, here is a cute easy application.
Median Putnam contestants, willing to devote one of the last Saturdays before final exams to a math test, are likely to receive an advanced degree in the sciences. It is counterproductive on many levels to leave them feeling like total idiots.
— Bruce Reznick, “Some Thoughts on Writing for the Putnam”
Last February I made a big public apology for having caused one of the biggest scoring errors in HMMT history, causing a lot of changes to the list of top individual students. Pleasantly, I got some nice emails from coaches who reminded me that most students and teams do not place highly in the tournament, and at the end of the day the most important thing is that the contestants enjoyed the tournament.
So now I decided I have to apologize for 2016, too.
The story this time is that I inadvertently sent over 100 students home having solved two or fewer problems total, out of 30 individual problems. That year, I was the problem czar for HMMT February 2016, and like many HMMT problem czars before me, had vastly underestimated the difficulty of my own problems.
I think stories like this are a lot worse than people realize; contests are supposed to be a learning experience for the students, and if a teenager shows up to Massachusetts and spends an entire Saturday feeling hopeless for the entire contest, then the flight back to California is going to feel very long. Now imagine having 100 students go through this every single February.
So today I’d like to say a bit about things I’ve picked up since then that have helped me avoid making similar mistakes. I actually think people generally realize that HMMT is too hard, but are wrong about how this should be fixed. In particular, I think the common approach (and the one I took) of “make problem 1 so easy that almost nobody gets a zero” is wrong, and I’ll explain here what I think should be done instead.
1. Gettable, not gimme
I think just “easy” is the wrong way to think about the beginning problems. At ARML, the problem authors use a finer distinction which I really like:
- A problem is gettable if nearly every contestant feels like they could have gotten the problem on a good day. (In particular, problems that require knowledege that not all contestants have are not gettable, even if they are easy with it.)
- A problem is a gimme if nearly every contestant actually solves the problem on the contest.
The consensus is always that the early problems should be gettable but not gimme’s. You could start every contest by asking the contestant to compute the expected value of 7, but the contestants are going to notice, and it isn’t going to help anyone.
(I guess I should make the point that in order for a problem to be a “gimme”, it would have to be so easy to be almost insulting, because high accuracy on a given problem is really only possible if the level of the problem is significantly below the level of the student. So a gimme would have to be a problem that is way easier than the level of the weakest contestant — you can see why these would be bad.)
In contrast, with a gettable problem, even though some of the contestants will miss it, they’ll often miss it for a reason like 2+3=6. This is a bit unfortunate, but it is still a lot better if the contestant goes home thinking “I made a small arithmetic error, so I have to be more careful” than “there’s no way I could have gotten this, it was hopeless”.
But that brings to me to the next point:
2. At the IMO 33% of the problems are gettable
At the IMO, there are two easy problems (one each day), but there are only six problems. So a full one-third of the problems are gettable: we hope that most students attending the IMO can solve either IMO1 or IMO4, even though many will not solve both.
If you are writing HMMT or some similar contest, I think this means you should think about the opening in terms of the fraction 1/3, rather than problem 1. For example, at HMMT, I think the czars should strive instead to make the first three or four out of ten problems on each individual test gettable: they should be problems every contestant could solve, even though some of them will still miss it anyways. Under the pressure of contest, students are going to make all sorts of mistakes, and so it’s important that there are multiple gettable problems. This way, every student has two or three or four real chances to solve a problem: they’ll still miss a few, but at least they feel like they could do something.
(Every year at HMMT, when we look back at the tests in hindsight, the first reflex many czars have is to look at how many people got 0’s on each test, and hope that it’s not too many. The fact that this figure is even worth looking at is in my opinion a sign that we are doing things wrong: is 1/10 any better than 0/10, if the kid solved question 1 quickly and then spent the rest of the hour staring at the other nine?)
3. Watch the clock
The other thing I want to say is to spend some time thinking about the entire test as a whole, rather than about each problem individually.
To drive the point: I’m willing to bet that an HMMT individual test with 4 easy, 6 medium, and 0 hard problems could actually work, even at the top end of the scores. Each medium problem in isolation won’t distinguish the strongest students. But put six of them all together, and you get two effects:
- Students will make mistakes on some of the problems, and by central limit theorem you’ll get a curve anyways.
- Time pressure becomes significantly more important, and the strongest students will come out ahead by simply being faster.
Of course, I’ll never be able to persuade the problem czars (myself included) to not include at least one or two of those super-nice hard problems. But the point is that they’re not actually needed in situations like HMMT, when there are so many problems that it’s hard to not get a curve of scores.
One suggestion many people won’t take: if you really want to include some difficulty problems that will take a while, decrease the length of the test. If you had 3 easy, 3 medium, and 1 hard problem, I bet that could work too. One hour is really not very much time.
Actually, this has been experimentally verified. On my HMMT 2016 Geometry test, nobody solved any of problems 8-10, so the test was essentially seven problems long. The gradient of scores at the top and center still ended up being okay. The only issue was that a third of the students solved zero problems, because the easy problems were either error-prone, or else were hit-or-miss (either solved quickly or not at all). Thus that’s another thing to watch out for.
I recently had a combinatorics paper appear in the EJC. In this post I want to brag a bit by telling the “story” of this paper: what motivated it, how I found the conjecture that I originally did, and the process that eventually led me to the proof, and so on.
This work was part of the Duluth REU 2017, and I thank Joe Gallian for suggesting the problem.
Let me begin by formulating the problem as it was given to me. First, here is the definition and notation for a “block-ascending” permutation.
It turns out that block-ascending permutations which also avoid an increasing subsequence of certain length have nice enumerative properties. To this end, we define the following notation.
(The reason for using will be explained later.) In particular, if .
Now on to the results. A 2011 paper by Joel Brewster Lewis (JBL) proved (among other things) the following result:
Just before the Duluth REU in 2017, Mei and Wang proved that in fact, in Lewis’ result one may freely mix and ‘s. To simplify notation,
The proof uses the RSK correspondence, but the authors posed at the end of the paper the following open problem:
This was the first problem that I was asked to work on. (I remember I received the problem on Sunday morning; this actually matters a bit for the narrative later.)
At this point I should pause to mention that this notation is my own invention, and did not exist when I originally started working on the problem. Indeed, all the results are restricted to the case where for each , and so it was unnecessary to think about other possibilities for : Mei and Wang’s paper use the notation . So while I’ll continue to use the notation in the blog post for readability, it will make some of the steps more obvious than they actually were.
2. Setting out
Mei and Wang’s paper originally suggested that rather than finding a bijection for any and , it would suffice to biject
and then compose two such bijections. I didn’t see why this should be much easier, but it didn’t seem to hurt either.
As an example, they show how to do this bijection with and . Indeed, suppose . Then is an increasing sequence of length right at the start of . So had better be the largest element in the permutation: otherwise later in the biggest element would complete an ascending permutation of length already! So removing gives a bijection between .
But if you look carefully, this proof does essentially nothing with the later blocks. The exact same proof gives:
Once I found this proposition I rejected the initial suggestion of specializing . The “easy case” I had found told me that I could take a set and delete the single element from it. So empirically, my intuition from this toy example told me that it would be easier to find bijections whee and were only “a little different”, and hope that the resulting bijection only changed things a little bit (in the same way that in the toy example, all the bijection did was delete one element). So I shifted to trying to find small changes of this form.
3. The fork in the road
3.1. Wishful thinking
I had a lucky break of wishful thinking here. In the notation with , I had found that one could replace with either or freely. (But this proof relied heavily on the fact the block really being on the far left.) So what other changes might I be able to make?
There were two immediate possibilities that came to my mind.
- Deletion: We already showed could be changed from to for any . If we can do a similar deletion with for any , not just , then we would be done.
- Swapping: If we can show that two adjacent ‘s could be swapped, that would be sufficient as well. (It’s also possible to swap non-adjacent ‘s, but that would cause more disruption for no extra benefit.)
Now, I had two paths that both seemed plausible to chase after. How was I supposed to know which one to pick? (Of course, it’s possible neither work, but you have to start somewhere.)
Well, maybe the correct thing to do would have to just try both. But it was Sunday afternoon by the time I got to this point. Granted, it was summer already, but I knew that come Monday I would have doctor appointments and other trivial errands to distract me, so I decided I should pick one of them and throw the rest of the day into it. But that meant I had to pick one.
(I confess that I actually already had a prior guess: the deletion approach seemed less likely to work than the swapping approach. In the deletion approach, if is somewhere in the middle of the permutation, it seemed like deleting an element could cause a lot of disruption. But the swapping approach preserved the total number of elements involved, and so seemed more likely that I could preserve structure. But really I was just grasping at straws.)
3.2. Enter C++
Yeah, I cheated. Sorry.
Those of you that know anything about my style of math know that I am an algebraist by nature — sort of. It’s more accurate to say that I depend on having concrete examples to function. True, I can’t do complexity theory for my life, but I also haven’t been able to get the hang of algebraic geometry, despite having tried to learn it three or four times by now. But enumerative combinatorics? OH LOOK EXAMPLES.
Here’s the plan: let . Then using a C++ computer program:
- Enumerate all the permutations in .
- Enumerate all the permutations in .
- Enumerate all the permutations in .
If the deletion approach is right, then I would hope and look pretty similar. On the flip side, if the swapping approach is right, then and should look close to each other instead.
It’s moments like this where my style of math really shines. I don’t have to make decisions like the above off gut-feeling: do the “data science” instead.
3.3. A twist of fate
Except this isn’t actually what I did, since there was one problem. Computing the longest increasing subsequence of a length permutation takes time, and there are or so permutations. But when , we have , which is a pretty big number. Unfortunately, my computer is not really that fast, and I didn’t really have the patience to implement the “correct” algorithms to bring the runtime down.
The solution? Use instead.
In a deep irony that I didn’t realize at the time, it was this moment when I introduced the notation, and for the first time allowed the to not be in . My reasoning was that since I was only doing this for heuristic reasons, I could instead work with and probably not change much about the structure of the problem, while replacing , which would run times faster. This was okay since all I wanted to do was see how much changing the “middle” would disrupt the structure.
And so the new plan was:
- Enumerate all the permutations in .
- Enumerate all the permutations in .
- Enumerate all the permutations in .
I admit I never actually ran the enumeration with , because the route with and turned out to be even more promising than I expected. When I compared the empirical data for the sets and , I found that the number of permutations with any particular triple were equal. In other words, the outer blocks were preserved: the bijection
does not tamper with the outside blocks of length and .
This meant I was ready to make the following conjecture. Suppose , . There is a bijection
which only involves rearranging the elements of the th and st blocks.
4. Rooting out the bijection
At this point I was in a quite good position. I had pinned down the problem to a finding a particular bijection that I was confident had to exist, since it was showing up to the empirical detail.
Let’s call this mythical bijection . How could I figure out what it was?
4.1. Hunch: preserves order-isomorphism
Let me quickly introduce a definition.
Now I guessed one more property of : this should order-isomorphism.
What do I mean by this? Suppose in one context changed to ; then we would expect that in another situation we should have changing to . Indeed, we expect (empirically) to not touch surrounding outside blocks, and so it would be very strange if behaved differently due to far-away numbers it wasn’t even touching.
So actually I’ll just write
for this example, reducing the words in question.
4.2. Keep cheating
With this hunch it’s possible to cheat with C++ again. Here’s how.
Let’s for concreteness suppose and the particular sets
Well, it turns out if you look at the data:
- The only element of which starts with and ends with is .
- The only element of which starts with and ends with is .
So that means that is changed to . Thus the empirical data shows that
In general, it might not be that clear cut. For example, if we look at the permutations starting with and , there is more than one.
- and are both in .
- and are both in in .
but we can’t tell which one goes to which (although you might be able to guess).
Fortunately, there is lots of data. This example narrowed down to two values, but if you look at other places you might have different data on . Since we think is behaving the same “globally”, we can piece together different pieces of data to get narrower sets. Even better, is a bijection, so once we match either of or , we’ve matched the other.
You know what this sounds like? Perfect matchings.
So here’s the experimental procedure.
- Enumerate all permutations in and .
- Take each possible tuple , and look at the permutations that start and end with those particular four elements. Record the reductions of and for all these permutations. We call these input words and output words, respectively. Each output word is a “candidate” of for a input word.
- For each input word that appeared, take the intersection of all output words that appeared. This gives a bipartite graph , with input words being matched to their candidates.
- Find perfect matchings of the graph.
And with any luck that would tell us what is.
Luckily, the bipartite graph is quite sparse, and there was only one perfect matching.
246|1357 => 2467|135 247|1356 => 2457|136 256|1347 => 2567|134 257|1346 => 2357|146 267|1345 => 2367|145 346|1257 => 3467|125 347|1256 => 3457|126 356|1247 => 3567|124 357|1246 => 1357|246 367|1245 => 1367|245 456|1237 => 4567|123 457|1236 => 1457|236 467|1235 => 1467|235 567|1234 => 1567|234
If you look at the data, well, there are some clear patterns. Exactly one number is “moving” over from the right half, each time. Also, if is on the right half, then it always moves over.
Anyways, if you stare at this for an hour, you can actually figure out the exact rule:
And indeed, once I have this bijection, it takes maybe only another hour of thinking to verify that this bijection works as advertised, thus solving the original problem.
Rather than writing up what I had found, I celebrated that Sunday evening by playing Wesnoth for 2.5 hours.
On Monday morning I was mindlessly feeding inputs to the program I had worked on earlier and finally noticed that in fact and also had the same cardinality. Huh.
It seemed too good to be true, but I played around some more, and sure enough, the cardinality of seemed to only depend on the order of the ‘s. And so at last I stumbled upon the final form the conjecture, realizing that all along the assumption that I had been working with was a red herring, and that the bijection was really true in much vaster generality. There is a bijection
which only involves rearranging the elements of the th and st blocks.
It also meant I had more work to do, and so I was now glad that I hadn’t written up my work from yesterday night.
5.2. More data science
I re-ran the experiment I had done before, now with . (This was interesting, because the elements in question could now have either longest increasing subsequence of length , or instead of length .)
The data I obtained was:
246|13578 => 24678|135 247|13568 => 24578|136 248|13567 => 24568|137 256|13478 => 25678|134 257|13468 => 23578|146 258|13467 => 23568|147 267|13458 => 23678|145 268|13457 => 23468|157 278|13456 => 23478|156 346|12578 => 34678|125 347|12568 => 34578|126 348|12567 => 34568|127 356|12478 => 35678|124 357|12468 => 13578|246 358|12467 => 13568|247 367|12458 => 13678|245 368|12457 => 13468|257 378|12456 => 13478|256 456|12378 => 45678|123 457|12368 => 14578|236 458|12367 => 14568|237 467|12358 => 14678|235 468|12357 => 12468|357 478|12356 => 12478|356 567|12348 => 15678|234 568|12347 => 12568|347 578|12346 => 12578|346 678|12345 => 12678|345
Okay, so it looks like:
- exactly two numbers are moving each time, and
- the length of the longest run is preserved.
Eventually, I was able to work out the details, but they’re more involved than I want to reproduce here. But the idea is that you can move elements “one at a time”: something like
while preserving the length of increasing subsequences at each step.
So, together with the easy observation from the beginning, this not only resolves the original problem, but also gives an elegant generalization. I had now proved:
6. Discovered vs invented
Whenever I look back on this, I can’t help thinking just how incredibly lucky I got on this project.
There’s this perpetual debate about whether mathematics is discovered or invented. I think it’s results like this which make the case for “discovered”. I did not really construct the bijection myself: it was “already there” and I found it by examining the data. In another world where did not exist, all the creativity in the world wouldn’t have changed anything.
So anyways, that’s the behind-the-scenes tour of my favorite combinatorics paper.
I wanted to quickly write this proof up, complete with pictures, so that I won’t forget it again. In this post I’ll give a combinatorial proof (due to Joyal) of the following:
Proof: We are going to construct a bijection between
- Functions (of which there are ) and
- Trees on with two distinguished nodes and (possibly ).
This will imply the answer.
Let’s look at the first piece of data. We can visualize it as points floating around, each with an arrow going out of it pointing to another point, but possibly with many other arrows coming into it. Such a structure is apparently called a directed pseudoforest. Here is an example when .
You’ll notice that in each component, some of the points lie in a cycle and others do not. I’ve colored the former type of points blue, and the corresponding arrows magenta.
Thus a directed pseudoforest can also be specified by
- a choice of some vertices to be in cycles (blue vertices),
- a permutation on the blue vertices (magenta arrows), and
- attachments of trees to the blue vertices (grey vertices and arrows).
Now suppose we take the same information, but replace the permutation on the blue vertices with a total ordering instead (of course there are an equal number of these). Then we can string the blue vertices together as shown below, where the green arrows denote the selected total ordering (in this case ):
This is exactly the data of a tree on the vertices with two distinguished vertices, the first and last in the chain of green (which could possibly coincide).