A story of block-ascending permutations

I recently had a combinatorics paper appear in the EJC. In this post I want to brag a bit by telling the “story” of this paper: what motivated it, how I found the conjecture that I originally did, and the process that eventually led me to the proof, and so on.

This work was part of the Duluth REU 2017, and I thank Joe Gallian for suggesting the problem.

1. Background

Let me begin by formulating the problem as it was given to me. First, here is the definition and notation for a “block-ascending” permutation.

Definition 1

For nonnegative integers {a_1}, …, {a_n} an {(a_1, \dots, a_n)}-ascending permutation is a permutation on {\{1, 2, \dots, a_1 + \dots + a_n\}} whose descent set is contained in {\{a_1, a_1+a_2, \dots, a_1+\dots+a_{n-1}\}}. In other words the permutation ascends in blocks of length {a_1}, {a_2}, …, {a_n}, and thus has the form

\displaystyle \pi = \pi_{11} \dots \pi_{1a_1} | \pi_{21} \dots \pi_{2a_2} | \dots | \pi_{n1} \dots \pi_{na_n}

for which {\pi_{i1} < \pi_{i2} < \dots < \pi_{ia_i}} for all {i}.

It turns out that block-ascending permutations which also avoid an increasing subsequence of certain length have nice enumerative properties. To this end, we define the following notation.

Definition 2

Let {\mathcal L_{k+2}(a_1, \dots, a_n)} denote the set of {(a_1, \dots, a_n)}-ascending permutations which avoid the pattern {12 \dots (k+2)}.

(The reason for using {k+2} will be explained later.) In particular, {\mathcal L_{k+2}(a_1 ,\dots, a_n) = \varnothing} if {\max \{a_1, \dots, a_n\} \ge k+2}.

Example 3

Here is a picture of a permutation in {\mathcal L_7(3,2,4)} (but not in {\mathcal L_6(3,2,4)}, since one can see an increasing length {6} subsequence shaded). We would denote it {134|69|2578}.

Now on to the results. A 2011 paper by Joel Brewster Lewis (JBL) proved (among other things) the following result:

Theorem 4 (Lewis 2011)

The sets {\mathcal L_{k+2}(k,k,\dots,k)} and {\mathcal L_{k+2}(k+1,k+1,\dots,k+1)} are in bijection with Young tableau of shape {\left< (k+1)^n \right>}.

Remark 5

When {k=1}, this implies {\mathcal L_3(1,1,\dots,1)}, which is the set of {123}-avoiding permutations of length {n}, is in bijection with the Catalan numbers; so is {\mathcal L_3(2,\dots,2)} which is the set of {123}-avoiding zig-zag permutations.

Just before the Duluth REU in 2017, Mei and Wang proved that in fact, in Lewis’ result one may freely mix {k} and {k+1}‘s. To simplify notation,

Definition 6

Let {I \subseteq \left\{ 1,\dots,n \right\}}. Then {\mathcal L(n,k,I)} denotes {\mathcal L_{k+2}(a_1,\dots,a_n)} where

\displaystyle a_i = \begin{cases} k+1 & i \in I \\ k & i \notin I. \end{cases}

Theorem 7 (Mei, Wang 2017)

The {2^n} sets {\mathcal L(n,k,I)} are also in bijection with Young tableau of shape {\left< (k+1)^n \right>}.

The proof uses the RSK correspondence, but the authors posed at the end of the paper the following open problem:


Find a direct bijection between the {2^n} sets {\mathcal L(n,k,I)} above, not involving the RSK correspondence.

This was the first problem that I was asked to work on. (I remember I received the problem on Sunday morning; this actually matters a bit for the narrative later.)

At this point I should pause to mention that this {\mathcal L_{k+2}(\dots)} notation is my own invention, and did not exist when I originally started working on the problem. Indeed, all the results are restricted to the case where {a_i \in \{k,k+1\}} for each {i}, and so it was unnecessary to think about other possibilities for {a_i}: Mei and Wang’s paper use the notation {\mathcal L(n,k,I)}. So while I’ll continue to use the {\mathcal L_{k+2}(\dots)} notation in the blog post for readability, it will make some of the steps more obvious than they actually were.

2. Setting out

Mei and Wang’s paper originally suggested that rather than finding a bijection {\mathcal L(n,k,I) \rightarrow \mathcal L(n,k,J)} for any {I} and {J}, it would suffice to biject

\displaystyle \mathcal L(n,k,I) \rightarrow \mathcal L(n,k,\varnothing)

and then compose two such bijections. I didn’t see why this should be much easier, but it didn’t seem to hurt either.

As an example, they show how to do this bijection with {I = \{1\}} and {I = \{n\}}. Indeed, suppose {I = \{1\}}. Then {\pi_{11} < \pi_{12} < \dots < \pi_{1(k+1)}} is an increasing sequence of length {k+1} right at the start of {\pi}. So {\pi_{1(k+1)}} had better be the largest element in the permutation: otherwise later in {\pi} the biggest element would complete an ascending permutation of length {k+2} already! So removing {\pi_{1(k+1)}} gives a bijection between {\mathcal L(n,k,\{1\}) \rightarrow \mathcal L(n,k,\varnothing)}.

But if you look carefully, this proof does essentially nothing with the later blocks. The exact same proof gives:

Proposition 8

Suppose {1 \notin I}. Then there is a bijection

\displaystyle \mathcal L(n,k,I \cup \{1\}) \rightarrow \mathcal L(n,k,I)

by deleting the {(k+1)}st element of the permutation (which must be largest one).

Once I found this proposition I rejected the initial suggestion of specializing {\mathcal L(n,k,I) \rightarrow \mathcal L(n,k,\varnothing)}. The “easy case” I had found told me that I could take a set {I} and delete the single element {1} from it. So empirically, my intuition from this toy example told me that it would be easier to find bijections {\mathcal L(n,k,I) \rightarrow \mathcal L(n,k,I')} whee {I'} and {I} were only “a little different”, and hope that the resulting bijection only changed things a little bit (in the same way that in the toy example, all the bijection did was delete one element). So I shifted to trying to find small changes of this form.

3. The fork in the road

3.1. Wishful thinking

I had a lucky break of wishful thinking here. In the notation {\mathcal L_{k+2}(a_1, \dots, a_n)} with {a_i \in \{k,k+1\}}, I had found that one could replace {a_1} with either {k} or {k+1} freely. (But this proof relied heavily on the fact the block really being on the far left.) So what other changes might I be able to make?

There were two immediate possibilities that came to my mind.

  • Deletion: We already showed {a_1} could be changed from {k+1} to {k} for any {i}. If we can do a similar deletion with {a_i} for any {i}, not just {i=1}, then we would be done.
  • Swapping: If we can show that two adjacent {a_i}‘s could be swapped, that would be sufficient as well. (It’s also possible to swap non-adjacent {a_i}‘s, but that would cause more disruption for no extra benefit.)

Now, I had two paths that both seemed plausible to chase after. How was I supposed to know which one to pick? (Of course, it’s possible neither work, but you have to start somewhere.)

Well, maybe the correct thing to do would have to just try both. But it was Sunday afternoon by the time I got to this point. Granted, it was summer already, but I knew that come Monday I would have doctor appointments and other trivial errands to distract me, so I decided I should pick one of them and throw the rest of the day into it. But that meant I had to pick one.

(I confess that I actually already had a prior guess: the deletion approach seemed less likely to work than the swapping approach. In the deletion approach, if {i} is somewhere in the middle of the permutation, it seemed like deleting an element could cause a lot of disruption. But the swapping approach preserved the total number of elements involved, and so seemed more likely that I could preserve structure. But really I was just grasping at straws.)

3.2. Enter C++

Yeah, I cheated. Sorry.

Those of you that know anything about my style of math know that I am an algebraist by nature — sort of. It’s more accurate to say that I depend on having concrete examples to function. True, I can’t do complexity theory for my life, but I also haven’t been able to get the hang of algebraic geometry, despite having tried to learn it three or four times by now. But enumerative combinatorics? OH LOOK EXAMPLES.

Here’s the plan: let {k=3}. Then using a C++ computer program:

  • Enumerate all the permutations in {S = \mathcal L_{k+2}(3,4,3,4)}.
  • Enumerate all the permutations in {A = \mathcal L_{k+2}(3,3,3,4)}.
  • Enumerate all the permutations in {B = \mathcal L_{k+2}(3,3,4,4)}.

If the deletion approach is right, then I would hope {S} and {A} look pretty similar. On the flip side, if the swapping approach is right, then {S} and {B} should look close to each other instead.

It’s moments like this where my style of math really shines. I don’t have to make decisions like the above off gut-feeling: do the “data science” instead.

3.3. A twist of fate

Except this isn’t actually what I did, since there was one problem. Computing the longest increasing subsequence of a length {N} permutation takes {O(N \log N)} time, and there are {N!} or so permutations. But when {N = 3+4+3+4=14}, we have {N! \cdot N \log N \approx 3 \cdot 10^{12}}, which is a pretty big number. Unfortunately, my computer is not really that fast, and I didn’t really have the patience to implement the “correct” algorithms to bring the runtime down.

The solution? Use {N = 1+4+3+2 = 10} instead.

In a deep irony that I didn’t realize at the time, it was this moment when I introduced the {\mathcal L_{k+2}(a_1, \dots, a_n)} notation, and for the first time allowed the {a_i} to not be in {\{k,k+1\}}. My reasoning was that since I was only doing this for heuristic reasons, I could instead work with {S = \mathcal L_{k+2}(2,4,3,2)} and probably not change much about the structure of the problem, while replacing {N = 2 + 4 + 3 + 2 = 11}, which would run {1000} times faster. This was okay since all I wanted to do was see how much changing the “middle” would disrupt the structure.

And so the new plan was:

  • Enumerate all the permutations in {S = \mathcal L_{k+2}(1,4,3,2)}.
  • Enumerate all the permutations in {A = \mathcal L_{k+2}(1,3,3,2)}.
  • Enumerate all the permutations in {B = \mathcal L_{k+2}(1,3,4,2)}.

I admit I never actually ran the enumeration with {A}, because the route with {S} and {B} turned out to be even more promising than I expected. When I compared the empirical data for the sets {S} and {B}, I found that the number of permutations with any particular triple {(\pi_1, \pi_9, \pi_{10})} were equal. In other words, the outer blocks were preserved: the bijection

\displaystyle \mathcal L_{k+2}(1,4,3,2) \rightarrow \mathcal L_{k+2}(1,3,4,2)

does not tamper with the outside blocks of length {1} and {2}.

This meant I was ready to make the following conjecture. Suppose {a_i = k}, {a_{i+1} = k+1}. There is a bijection

\displaystyle \mathcal L_{k+2}(a_1, \dots, a_i, a_{i+1}, \dots, a_n) \rightarrow \mathcal L_{k+2}(a_1, \dots, a_{i+1}, a_{i}, \dots, a_n)

which only involves rearranging the elements of the {i}th and {(i+1)}st blocks.

4. Rooting out the bijection

At this point I was in a quite good position. I had pinned down the problem to a finding a particular bijection that I was confident had to exist, since it was showing up to the empirical detail.

Let’s call this mythical bijection {\mathbf W}. How could I figure out what it was?

4.1. Hunch: {\mathbf W} preserves order-isomorphism

Let me quickly introduce a definition.

Definition 9

We say two words {a_1 \dots a_m} and {b_1 \dots b_m} are order-isomorphic if {a_i < a_j} if and only {b_i < b_j}. Then order-isomorphism gives equivalence classes, and there is a canonical representative where the letters are {\{1,2,\dots,m\}}; this is called a reduced word.

Example 10

The words {13957}, {12846} and {12534} are order-isomorphic; the last is reduced.

Now I guessed one more property of {\mathbf W}: this {\mathbf W} should order-isomorphism.

What do I mean by this? Suppose in one context {139 | 57} changed to {39 | 157}; then we would expect that in another situation we should have {124 | 68} changing to {24 | 168}. Indeed, we expect {\mathbf W} (empirically) to not touch surrounding outside blocks, and so it would be very strange if {\mathbf W} behaved differently due to far-away numbers it wasn’t even touching.

So actually I’ll just write

\displaystyle \mathbf W(123|45) = 23|145

for this example, reducing the words in question.

4.2. Keep cheating

With this hunch it’s possible to cheat with C++ again. Here’s how.

Let’s for concreteness suppose {k=2} and the particular sets

\displaystyle \mathcal L_{k+2}(1,3,2,1) \rightarrow \mathcal L_{k+2}(1,2,3,1).

Well, it turns out if you look at the data:

  • The only element of {\mathcal L_{k+2}(1,3,2,1)} which starts with {2} and ends with {5} is {2|147|36|5}.
  • The only element of {\mathcal L_{k+2}(1,2,3,1)} which starts with {2} and ends with {5} is {2|47|136|5}.

So that means that {147 | 36} is changed to {47 | 136}. Thus the empirical data shows that

\displaystyle \mathbf W(135|24) = 35|124.

In general, it might not be that clear cut. For example, if we look at the permutations starting with {2} and {4}, there is more than one.

  • {2 | 1 5 7 | 3 6 | 4} and {2 | 1 6 7 | 3 5 | 4} are both in {\mathcal L_{k+2}(1,3,2,1)}.
  • {2 | 5 7 | 1 3 6 | 4} and {2 | 6 7 | 1 3 5 | 4} are both in in {\mathcal L_{k+2}(1,2,3,1)}.


\displaystyle \mathbf W( \{135|24, 145|23\} ) = \{35|124, 45|123\}

but we can’t tell which one goes to which (although you might be able to guess).

Fortunately, there is lots of data. This example narrowed {135|24} down to two values, but if you look at other places you might have different data on {135|24}. Since we think {\mathbf W} is behaving the same “globally”, we can piece together different pieces of data to get narrower sets. Even better, {\mathbf W} is a bijection, so once we match either of {135|24} or {145|23}, we’ve matched the other.

You know what this sounds like? Perfect matchings.

So here’s the experimental procedure.

  • Enumerate all permutations in {\mathcal L_{k+2}(2,3,4,2)} and {\mathcal L_{k+2}(2,4,3,2)}.
  • Take each possible tuple {(\pi_1, \pi_2, \pi_{10}, \pi_{11})}, and look at the permutations that start and end with those particular four elements. Record the reductions of {\pi_3\pi_4\pi_5|\pi_6\pi_7\pi_8\pi_9} and {\pi_3\pi_4\pi_5\pi_6|\pi_7\pi_8\pi_9} for all these permutations. We call these input words and output words, respectively. Each output word is a “candidate” of {\mathbf W} for a input word.
  • For each input word {a_1a_2a_3|b_1b_2b_3b_4} that appeared, take the intersection of all output words that appeared. This gives a bipartite graph {G}, with input words being matched to their candidates.
  • Find perfect matchings of the graph.

And with any luck that would tell us what {\mathbf W} is.

4.3. Results

Luckily, the bipartite graph is quite sparse, and there was only one perfect matching.

246|1357 => 2467|135
247|1356 => 2457|136
256|1347 => 2567|134
257|1346 => 2357|146
267|1345 => 2367|145
346|1257 => 3467|125
347|1256 => 3457|126
356|1247 => 3567|124
357|1246 => 1357|246
367|1245 => 1367|245
456|1237 => 4567|123
457|1236 => 1457|236
467|1235 => 1467|235
567|1234 => 1567|234

If you look at the data, well, there are some clear patterns. Exactly one number is “moving” over from the right half, each time. Also, if {7} is on the right half, then it always moves over.

Anyways, if you stare at this for an hour, you can actually figure out the exact rule:

Claim 11

Given an input {a_1a_2a_3|b_1b_2b_3b_4}, move {b_{i+1}} if {i} is the largest index for which {a_i < b_{i+1}}, or {b_1 = 1} if no such index exists.

And indeed, once I have this bijection, it takes maybe only another hour of thinking to verify that this bijection works as advertised, thus solving the original problem.

Rather than writing up what I had found, I celebrated that Sunday evening by playing Wesnoth for 2.5 hours.

5. Generalization

5.1. Surprise

On Monday morning I was mindlessly feeding inputs to the program I had worked on earlier and finally noticed that in fact {\mathcal L_6(1,3,5,2)} and {\mathcal L_6(1,5,3,2)} also had the same cardinality. Huh.

It seemed too good to be true, but I played around some more, and sure enough, the cardinality of {\#\mathcal L_{k+2}(a_1, \dots, a_n)} seemed to only depend on the order of the {a_i}‘s. And so at last I stumbled upon the final form the conjecture, realizing that all along the assumption {a_i \in \{k,k+1\}} that I had been working with was a red herring, and that the bijection was really true in much vaster generality. There is a bijection

\displaystyle \mathcal L_{k+2}(a_1, \dots, a_i, a_{i+1}, \dots, a_n) \rightarrow \mathcal L_{k+2}(a_1, \dots, a_{i+1}, a_{i}, \dots, a_n)

which only involves rearranging the elements of the {i}th and {(i+1)}st blocks.

It also meant I had more work to do, and so I was now glad that I hadn’t written up my work from yesterday night.

5.2. More data science

I re-ran the experiment I had done before, now with {\mathcal L_7(2,3,5,2) \rightarrow \mathcal L_7(2,5,3,2)}. (This was interesting, because the {8} elements in question could now have either longest increasing subsequence of length {5}, or instead of length {6}.)

The data I obtained was:

246|13578 => 24678|135
247|13568 => 24578|136
248|13567 => 24568|137
256|13478 => 25678|134
257|13468 => 23578|146
258|13467 => 23568|147
267|13458 => 23678|145
268|13457 => 23468|157
278|13456 => 23478|156
346|12578 => 34678|125
347|12568 => 34578|126
348|12567 => 34568|127
356|12478 => 35678|124
357|12468 => 13578|246
358|12467 => 13568|247
367|12458 => 13678|245
368|12457 => 13468|257
378|12456 => 13478|256
456|12378 => 45678|123
457|12368 => 14578|236
458|12367 => 14568|237
467|12358 => 14678|235
468|12357 => 12468|357
478|12356 => 12478|356
567|12348 => 15678|234
568|12347 => 12568|347
578|12346 => 12578|346
678|12345 => 12678|345

Okay, so it looks like:

  • exactly two numbers are moving each time, and
  • the length of the longest run is preserved.

Eventually, I was able to work out the details, but they’re more involved than I want to reproduce here. But the idea is that you can move elements “one at a time”: something like

\displaystyle \mathcal L_{k+2}(7,4) \rightarrow \mathcal L_{k+2}(6,5) \rightarrow \mathcal L_{k+2}(5,6) \rightarrow \mathcal L_{k+2}(4,7)

while preserving the length of increasing subsequences at each step.

So, together with the easy observation from the beginning, this not only resolves the original problem, but also gives an elegant generalization. I had now proved:

Theorem 12

For any {a_1}, …, {a_n}, the cardinality

\displaystyle \# \mathcal L_{k+2}(a_1, \dots, a_n)

does not depend on the order of the {a_i}‘s.

6. Discovered vs invented

Whenever I look back on this, I can’t help thinking just how incredibly lucky I got on this project.

There’s this perpetual debate about whether mathematics is discovered or invented. I think it’s results like this which make the case for “discovered”. I did not really construct the bijection {\mathbf W} myself: it was “already there” and I found it by examining the data. In another world where {\mathbf W} did not exist, all the creativity in the world wouldn’t have changed anything.

So anyways, that’s the behind-the-scenes tour of my favorite combinatorics paper.


Joyal’s Proof of Cayley’s Tree Formula

I wanted to quickly write this proof up, complete with pictures, so that I won’t forget it again. In this post I’ll give a combinatorial proof (due to Joyal) of the following:

Theorem 1 (Cayley’s Formula)

The number of trees on {n} labelled vertices is {n^{n-2}}.

Proof: We are going to construct a bijection between

  • Functions {\{1, 2, \dots, n\} \rightarrow \{1, 2, \dots, n\}} (of which there are {n^n}) and
  • Trees on {\{1, 2, \dots, n\}} with two distinguished nodes {A} and {B} (possibly {A=B}).

This will imply the answer.

Let’s look at the first piece of data. We can visualize it as {n} points floating around, each with an arrow going out of it pointing to another point, but possibly with many other arrows coming into it. Such a structure is apparently called a directed pseudoforest. Here is an example when {n = 9}.


You’ll notice that in each component, some of the points lie in a cycle and others do not. I’ve colored the former type of points blue, and the corresponding arrows magenta.

Thus a directed pseudoforest can also be specified by

  • a choice of some vertices to be in cycles (blue vertices),
  • a permutation on the blue vertices (magenta arrows), and
  • attachments of trees to the blue vertices (grey vertices and arrows).

Now suppose we take the same information, but replace the permutation on the blue vertices with a total ordering instead (of course there are an equal number of these). Then we can string the blue vertices together as shown below, where the green arrows denote the selected total ordering (in this case {1 < 9 < 2 < 4 < 8 < 5}):


This is exactly the data of a tree on the {n} vertices with two distinguished vertices, the first and last in the chain of green (which could possibly coincide). \Box

Combinatorial Nullstellensatz and List Coloring

More than six months late, but here are notes from the combinatorial nullsetllensatz talk I gave at the student colloquium at MIT. This was also my term paper for 18.434, “Seminar in Theoretical Computer Science”.

1. Introducing the choice number

One of the most fundamental problems in graph theory is that of a graph coloring, in which one assigns a color to every vertex of a graph so that no two adjacent vertices have the same color. The most basic invariant related to the graph coloring is the chromatic number:

Definition 1

A simple graph {G} is {k}-colorable if it’s possible to properly color its vertices with {k} colors. The smallest such {k} is the chromatic number {\chi(G)}.

In this exposition we study a more general notion in which the set of permitted colors is different for each vertex, as long as at least {k} colors are listed at each vertex. This leads to the notion of a so-called choice number, which was introduced by Erdös, Rubin, and Taylor.

Definition 2

A simple graph {G} is {k}-choosable if its possible to properly color its vertices given a list of {k} colors at each vertex. The smallest such {k} is the choice number {\mathop{\mathrm{ch}}(G)}.

Example 3

We have {\mathop{\mathrm{ch}}(C_{2n}) = \chi(C_{2n}) = 2} for any integer {n} (here {C_{2n}} is the cycle graph on {2n} vertices). To see this, we only have to show that given a list of two colors at each vertex of {C_{2n}}, we can select one of them.

  • If the list of colors is the same at each vertex, then since {C_{2n}} is bipartite, we are done.
  • Otherwise, suppose adjacent vertices {v_1}, {v_{2n}} are such that some color at {c} is not in the list at {v_{2n}}. Select {c} at {v_1}, and then greedily color in {v_2}, \dots, {v_{2n}} in that order.

We are thus naturally interested in how the choice number and the chromatic number are related. Of course we always have

\displaystyle \mathop{\mathrm{ch}}(G) \ge \chi(G).

Näively one might expect that we in fact have an equality, since allowing the colors at vertices to be different seems like it should make the graph easier to color. However, the following example shows that this is not the case.

Example 4 (Erdös)

Let {n \ge 1} be an integer and define

\displaystyle G = K_{n^n, n}.

We claim that for any integer {n \ge 1} we have

\displaystyle \mathop{\mathrm{ch}}(G) \ge n+1 \quad\text{and}\quad \chi(G) = 2.

The latter equality follows from {G} being partite.

Now to see the first inequality, let {G} have vertex set {U \cup V}, where {U} is the set of functions {u : [n] \rightarrow [n]} and {V = [n]}. Then consider {n^2} colors {C_{i,j}} for {1 \le i, j \le n}. On a vertex {u \in U}, we list colors {C_{1,u(1)}}, {C_{2,u(2)}}, \dots, {C_{n,u(n)}}. On a vertex {v \in V}, we list colors {C_{v,1}}, {C_{v,2}}, \dots, {C_{v,n}}. By construction it is impossible to properly color {G} with these colors.

The case {n = 3} is illustrated in the figure below (image in public domain).


This surprising behavior is the subject of much research: how can we bound the choice number of a graph as a function of its chromatic number and other properties of the graph? We see that the above example requires exponentially many vertices in {n}.

Theorem 5 (Noel, West, Wu, Zhu)

If {G} is a graph with {n} vertices then

\displaystyle \chi(G) \le \mathop{\mathrm{ch}}(G) \le \max\left( \chi(G), \left\lceil \frac{\chi(G)+n-1}{3} \right\rceil \right).

In particular, if {n \le 2\chi(G)+1} then {\mathop{\mathrm{ch}}(G) = \chi(G)}.

One of the most major open problems in this direction is the following.

Definition 6

A claw-free graph is a graph with no induced {K_{3,1}}. For example, the line graph (also called edge graph) of any simple graph {G} is claw-free.

If {G} is a claw-free graph, then {\mathop{\mathrm{ch}}(G) = \chi(G)}. In particular, this conjecture implies that for edge coloring, the notions of “chromatic number” and “choice number” coincide.


In this exposition, we prove the following result of Alon.

Theorem 7 (Alon)

A bipartite graph {G} is {\left\lfloor L(G) \right\rfloor+1} choosable, where

\displaystyle L(G) \overset{\mathrm{def}}{=} \max_{H \subseteq G} |E(H)|/|V(H)|

is half the maximum of the average degree of subgraphs {H}.

In particular, recall that a planar bipartite graph {H} with {r} vertices contains at most {2r-4} edges. Thus for such graphs we have {L(G) \le 2} and deduce:

Corollary 8

A planar bipartite graph is {3}-choosable.

This corollary is sharp, as it applies to {K_{2,4}} which we have seen in Example 4 has {\mathop{\mathrm{ch}}(K_{2,4}) = 3}.

The rest of the paper is divided as follows. First, we begin in §2 by stating Theorem 9, the famous combinatorial nullstellensatz of Alon. Then in §3 and §4, we provide descriptions of the so-called graph polynomial, to which we then apply combinatorial nullstellensatz to deduce Theorem 18. Finally in §5, we show how to use Theorem 18 to prove Theorem 7.

2. Combinatorial Nullstellensatz

The main tool we use is the Combinatorial Nullestellensatz of Alon.

Theorem 9 (Combinatorial Nullstellensatz)

Let {F} be a field, and let {f \in F[x_1, \dots, x_n]} be a polynomial of degree {t_1 + \dots + t_n}. Let {S_1, S_2, \dots, S_n \subseteq F} such that {\left\lvert S_i \right\rvert > t_i} for all {i}.

Assume the coefficient of {x_1^{t_1}x_2^{t_2}\dots x_n^{t_n}} of {f} is not zero. Then we can pick {s_1 \in S_1}, \dots, {s_n \in S_n} such that

\displaystyle f(s_1, s_2, \dots, s_n) \neq 0.

Example 10

Let us give a second proof that

\displaystyle \mathop{\mathrm{ch}}(C_{2n}) = 2

for every positive integer {n}. Our proof will be an application of the Nullstellensatz.

Regard the colors as real numbers, and let {S_i} be the set of colors at vertex {i} (hence {1 \le i \le 2n}, and {|S_i| = 2}). Consider the polynomial

\displaystyle f = \left( x_1-x_2 \right)\left( x_2-x_3 \right) \dots \left( x_{2n-1}-x_{2n} \right)\left( x_{2n}-x_1 \right)

The coefficient of {x_1^1 x_2^1 \dots x_{2n}^1} is {2 \neq 0}. Therefore, one can select a color from each {S_i} so that {f} does not vanish.

3. The Graph Polynomial, and Directed Orientations

Motivated by Example 10, we wish to apply a similar technique to general graphs {G}. So in what follows, let {G} be a (simple) graph with vertex set {\{1, \dots, n\}}.

Definition 11

The graph polynomial of {G} is defined by

\displaystyle f_G(x_1, \dots, x_n) = \prod_{\substack{(i,j) \in E(G) \\ i < j}} (x_i-x_j).

We observe that coefficients of {f_G} correspond to differences in directed orientations. To be precise, we introduce the notation:

Definition 12

Consider orientations on the graph {G} with vertex set {\{1, \dots, n\}}, meaning we assign a direction {v \rightarrow w} to every edge of {G} to make it into a directed graph {G}. An oriented edge is called ascending if {v \rightarrow w} and {v \le w}, i.e. the edge points from the smaller number to the larger one.

Then we say that an orientation is

  • even if there are an even number of ascending edges, and
  • odd if there are an odd number of ascending edges.

Finally, we define

  • {\mathop{\mathrm{DE}}_G(d_1, \dots, d_n)} to the be set of all even orientations of {G} in which vertex {i} has indegree {d_i}.
  • {\mathop{\mathrm{DO}}_G(d_1, \dots, d_n)} to the be set of all odd orientations of {G} in which vertex {i} has indegree {d_i}.

Set {\mathop{\mathrm{D}}_G(d_1,\dots,d_n) = \mathop{\mathrm{DE}}_G(d_1,\dots,d_n) \cup \mathop{\mathrm{DO}}_G(d_1,\dots,d_n)}.

Example 13

Consider the following orientation:

even-orientationThere are exactly two ascending edges, namely {1 \rightarrow 2} and {2 \rightarrow 4}. The indegrees of are {d_1 = 0}, {d_2 = 2} and {d_3 = d_4 = 1}. Therefore, this particular orientation is an element of {\mathop{\mathrm{DE}}_G(0,2,1,1)}. In terms of {f_G}, this corresponds to the choice of terms

\displaystyle \left( x_1- \boldsymbol{x_2} \right) \left( \boldsymbol{x_2}-x_3 \right) \left( x_2-\boldsymbol{x_4} \right) \left( \boldsymbol{x_3}-x_4 \right)

which is a {+ x_2^2 x_3 x_4} term.

Lemma 14

In the graph polynomial of {G}, the coefficient of {x_1^{d_1} \dots x_n^{d_n}} is

\displaystyle \left\lvert \mathop{\mathrm{DE}}_G(d_1, \dots, d_n) \right\rvert - \left\lvert \mathop{\mathrm{DO}}_G(d_1, \dots, d_n) \right\rvert.

Proof: Consider expanding {f_G}. Then each expanded term corresponds to a choice of {x_i} or {x_j} from each {(i,j)}, as in Example 13. The term has coefficient {+1} is the orientation is even, and {-1} if the orientation is odd, as desired. \Box

Thus we have an explicit combinatorial description of the coefficients in the graph polynomial {f_G}.

4. Coefficients via Eulerian Suborientations

We now give a second description of the coefficients of {f_G}.

Definition 15

Let {D \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)}, viewed as a directed graph. An Eulerian suborientation of {D} is a subgraph of {D} (not necessarily induced) in which every vertex has equal indegree and outdegree. We say that such a suborientation is

  • even if it has an even number of edges, and
  • odd if it has an odd number of edges.

Note that the empty suborientation is allowed. We denote the even and odd Eulerian suborientations of {D} by {\mathop{\mathrm{EE}}(D)} and {\mathop{\mathrm{EO}}(D)}, respectively.

Eulerian suborientations are brought into the picture by the following lemma.

Lemma 16

Assume {D \in \mathop{\mathrm{DE}}_G(d_1, \dots, d_n)}. Then there are natural bijections

\displaystyle \begin{aligned} \mathop{\mathrm{DE}}_G(d_1, \dots, d_n) &\rightarrow \mathop{\mathrm{EE}}(D) \\ \mathop{\mathrm{DO}}_G(d_1, \dots, d_n) &\rightarrow \mathop{\mathrm{EO}}(D). \end{aligned}

Similarly, if {D \in \mathop{\mathrm{DO}}_G(d_1, \dots, d_n)} then there are bijections

\displaystyle \begin{aligned} \mathop{\mathrm{DE}}_G(d_1, \dots, d_n) &\rightarrow \mathop{\mathrm{EO}}(D) \\ \mathop{\mathrm{DO}}_G(d_1, \dots, d_n) &\rightarrow \mathop{\mathrm{EE}}(D). \end{aligned}

Proof: Consider any orientation {D' \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)}, Then we define a suborietation of {D}, denoted {D \rtimes D'}, by including exactly the edges of {D} whose orientation in {D'} is in the opposite direction. It’s easy to see that this induces a bijection

\displaystyle D \rtimes - : \mathop{\mathrm{D}}_G(d_1, \dots, d_n) \rightarrow \mathop{\mathrm{EE}}(D) \cup \mathop{\mathrm{EO}}(D)

Moreover, remark that

  • {D \rtimes D'} is even if {D} and {D'} are either both even or both odd, and
  • {D \rtimes D'} is odd otherwise.

The lemma follows from this. \Box

Corollary 17

In the graph polynomial of {G}, the coefficient of {x_1^{d_1} \dots x_n^{d_n}} is

\displaystyle \pm \left( \left\lvert \mathop{\mathrm{EE}}(D) \right\rvert - \left\lvert \mathop{\mathrm{EO}}(D) \right\rvert \right)

where {D \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)} is arbitrary.

Proof: Combine Lemma 14 and Lemma 16. \Box

We now arrive at the main result:

Theorem 18

Let {G} be a graph on {\{1, \dots, n\}}, and let {D \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)} be an orientation of {G}. If {\left\lvert \mathop{\mathrm{EE}}(D) \right\rvert \neq \left\lvert \mathop{\mathrm{EO}}(D) \right\rvert}, then given a list of {d_i+1} colors at each vertex of {G}, there exists a proper coloring of the vertices of {G}.

In particular, {G} is {(1+\max_i d_i)}-choosable.

Proof: Combine Corollary 17 with Theorem 9. \Box

5. Finding an orientation

Armed with Theorem 18, we are almost ready to prove Theorem 7. The last ingredient is that we need to find an orientation on {G} in which the maximal degree is not too large. This is accomplished by the following.

Lemma 19

Let {L(G) \overset{\mathrm{def}}{=} \max_{H \subseteq G} |E(H)|/|V(H)|} as in Theorem 7. Then {G} has an orientation in which every indegree is at most {\left\lceil L(G) \right\rceil}.

Proof: This is an application of Hall’s marriage theorem.

Let {d = \left\lceil L(G) \right\rceil \ge L(G)}. Construct a bipartite graph

\displaystyle E \cup X \qquad \text{where}\qquad E = E(G) \quad\text{ and }\quad X = \underbrace{V(G) \sqcup \dots \sqcup V(G)}_{d \text{ times}}.

Connect {e \in E} and {v \in X} if {v} is an endpoint of {e}. Since {d \ge L(G)} we satisfy Hall’s condition (as {L(G)} is a condition for all subgraphs {H \subseteq G}) and can match each edge in {E} to a (copy of some) vertex in {X}. Since there are exactly {d} copies of each vertex in {X}, the conclusion follows. \Box

Now we can prove Theorem 7. Proof: According to Lemma 19, pick {D \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)} where {\max d_i \le \left\lceil L(G) \right\rceil}. Since {G} is bipartite, we obviously have {\mathop{\mathrm{EO}}(D) = \varnothing}, since {G} cannot have any odd cycles. So Theorem 18 applies and we are done. \Box

A Sketchy Overview of Green-Tao

These are the notes of my last lecture in the 18.099 discrete analysis seminar. It is a very high-level overview of the Green-Tao theorem. It is a subset of this paper.

1. Synopsis

This post as in overview of the proof of:

Theorem 1 (Green-Tao)

The prime numbers contain arbitrarily long arithmetic progressions.

Here, Szemerédi’s theorem isn’t strong enough, because the primes have density approaching zero. Instead, one can instead try to prove the following “relativity” result.

Theorem (Relative Szemerédi)

Let {S} be a sparse “pseudorandom” set of integers. Then subsets of {A} with positive density in {S} have arbitrarily long arithmetic progressions.

In order to do this, we have to accomplish the following.

  • Make precise the notion of “pseudorandom”.
  • Prove the Relative Szemerédi theorem, and then
  • Exhibit a “pseudorandom” set {S} which subsumes the prime numbers.

This post will use the graph-theoretic approach to Szemerédi as in the exposition of David Conlon, Jacob Fox, and Yufei Zhao. In order to motivate the notion of pseudorandom, we return to the graph-theoretic approach of Roth’s theorem, i.e. the case {k=3} of Szemerédi’s theorem.

2. Defining the linear forms condition

2.1. Review of Roth theorem

Roth’s theorem can be phrased in two ways. The first is the “set-theoretic” formulation:

Theorem 2 (Roth, set version)

If {A \subseteq \mathbb Z/N} is 3-AP-free, then {|A| = o(N)}.

The second is a “weighted” version

Theorem 3 (Roth, weighted version)

Fix {\delta > 0}. Let {f : \mathbb Z/N \rightarrow [0,1]} with {\mathbf E f \ge \delta}. Then

\displaystyle \Lambda_3(f,f,f) \ge \Omega_\delta(1).

We sketch the idea of a graph-theoretic proof of the first theorem. We construct a tripartite graph {G_A} on vertices {X \sqcup Y \sqcup Z}, where {X = Y = Z = \mathbb Z/N}. Then one creates the edges

  • {(x,y)} if {2x+ y \in A},
  • {(x,z)} if {x-z \in A}, and
  • {(y,z)} if {-y-2z \in A}.

This construction is selected so that arithmetic progressions in {A} correspond to triangles in the graph {G_A}. As a result, if {A} has no 3-AP’s (except trivial ones, where {x+y+z=0}), the graph {G_A} has exactly one triangle for every edge. Then, we can use the theorem of Ruzsa-Szemerédi, which states that this graph {G_A} has {o(n^2)} edges.

2.2. The measure {\nu}

Now for the generalized version, we start with the second version of Roth’s theorem. Instead of a set {S}, we consider a function

\displaystyle \nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}

which we call a majorizing measure. Since we are now dealing with {A} of low density, we normalize {\nu} so that

\displaystyle \mathbf E[\nu] = 1 + o(1).

Our goal is to now show a result of the form:

Theorem (Relative Roth, informally, weighted version)

If {0 \le f \le \nu}, {\mathbf E f \ge \delta}, and {\nu} satisfies a “pseudorandom” condition, then {\Lambda_3(f,f,f) \ge \Omega_{\delta}(1)}.

The prototypical example of course is that if {A \subset S \subset \mathbb Z_N}, then we let {\nu(x) = \frac{N}{|S|} 1_S(x)}.

2.3. Pseudorandomness for {k=3}

So, how can we put the pseudorandom condition? Initially, consider {G_S} the tripartite graph defined earlier, and let {p = |S| / N}; since {S} is sparse we expect {p} small. The main idea that turns out to be correct is: The number of embeddings of {K_{2,2,2}} in {S} is “as expected”, namely {(1+o(1)) p^{12} N^6}. Here {K_{2,2,2}} is actually the {2}-blow-up of a triangle. This condition thus gives us control over the distribution of triangles in the sparse graph {G_S}: knowing that we have approximately the correct count for {K_{2,2,2}} is enough to control distribution of triangles.

For technical reasons, in fact we want this to be true not only for {K_{2,2,2}} but all of its subgraphs {H}.

Now, let’s move on to the weighted version. Let’s consider a tripartite graph {G}, which we can think of as a collection of three functions

\displaystyle \begin{aligned} \mu_{-z} &: X \times Y \rightarrow \mathbb R \\ \mu_{-y} &: X \times Z \rightarrow \mathbb R \\ \mu_{-x} &: Y \times Z \rightarrow \mathbb R. \end{aligned}

We think of {\mu} as normalized so that {\mathbf E[\mu_{-x}] = \mathbf E[\mu_{-y}] = \mathbf E[\mu_{-z}] = 1}. Then we can define

Definition 4

A weighted tripartite graph {\mu = (\mu_{-x}, \mu_{-y}, \mu_{-z})} satisfies the {3}-linear forms condition if

\displaystyle \begin{aligned} \mathbf E_{x^0,x^1,y^0,y^1,z^0,z^1} &\Big[ \mu_{-x}(y^0,z^0) \mu_{-x}(y^0,z^1) \mu_{-x}(y^1,z^0) \mu_{-x}(y^1,z^1) \\ & \mu_{-y}(x^0,z^0) \mu_{-y}(x^0,z^1) \mu_{-y}(x^1,z^0) \mu_{-y}(x^1,z^1) \\ & \mu_{-z}(x^0,y^0) \mu_{-z}(x^0,y^1) \mu_{-z}(x^1,y^0) \mu_{-z}(x^1,y^1) \Big] \\ &= 1 + o(1) \end{aligned}

and similarly if any of the twelve factors are deleted.

Then the pseudorandomness condition is according to the graph we defined above:

Definition 5

A function {\nu : \mathbb Z / N \rightarrow \mathbb Z} is satisfies the {3}-linear forms condition if {\mathbf E[\nu] = 1 + o(1)}, and the tripartite graph {\mu = (\mu_{-x}, \mu_{-y}, \mu_{-z})} defined by

\displaystyle \begin{aligned} \mu_{-z} &= \nu(2x+y) \\ \mu_{-y} &= \nu(x-z) \\ \mu_{-x} &= \nu(-y-2z) \end{aligned}

satisfies the {3}-linear forms condition.

Finally, the relative version of Roth’s theorem which we seek is:

Theorem 6 (Relative Roth)

Suppose {\nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}} satisfies the {3}-linear forms condition. Then for any {f : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}} bounded above by {\nu} and satisfying {\mathbf E[f] \ge \delta > 0}, we have

\displaystyle \Lambda_3(f,f,f) \ge \Omega_{\delta}(1).

2.4. Relative Szemerédi

We of course have:

Theorem 7 (Szemerédi)

Suppose {k \ge 3}, and {f : \mathbb Z/n \rightarrow [0,1]} with {\mathbf E[f] \ge \delta}. Then

\displaystyle \Lambda_k(f, \dots, f) \ge \Omega_{\delta}(1).

For {k > 3}, rather than considering weighted tripartite graphs, we consider a {(k-1)}-uniform {k}-partite hypergraph. For example, given {\nu} with {\mathbf E[\nu] = 1 + o(1)} and {k=4}, we use the construction

\displaystyle \begin{aligned} \mu_{-z}(w,x,y) &= \nu(3w+2x+y) \\ \mu_{-y}(w,x,z) &= \nu(2w+x-z) \\ \mu_{-x}(w,y,z) &= \nu(w-y-2z) \\ \mu_{-w}(x,y,z) &= \nu(-x-2y-3z). \end{aligned}

Thus 4-AP’s correspond to the simplex {K_4^{(3)}} (i.e. a tetrahedron). We then consider the two-blow-up of the simplex, and require the same uniformity on subgraphs of {H}.

Here is the compiled version:

Definition 8

A {(k-1)}-uniform {k}-partite weighted hypergraph {\mu = (\mu_{-i})_{i=1}^k} satisfies the {k}-linear forms condition if

\displaystyle \mathbf E_{x_1^0, x_1^1, \dots, x_k^0, x_k^1} \left[ \prod_{j=1}^k \prod_{\omega \in \{0,1\}^{[k] \setminus \{j\}}} \mu_{-j}\left( x_1^{\omega_1}, \dots, x_{j-1}^{\omega_{j-1}}, x_{j+1}^{\omega_{j+1}}, \dots, x_k^{\omega_k} \right)^{n_{j,\omega}} \right] = 1 + o(1)

for all exponents {n_{j,w} \in \{0,1\}}.

Definition 9

A function {\nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}} satisfies the {k}-linear forms condition if {\mathbf E[\nu] = 1 + o(1)}, and

\displaystyle \mathbf E_{x_1^0, x_1^1, \dots, x_k^0, x_k^1} \left[ \prod_{j=1}^k \prod_{\omega \in \{0,1\}^{[k] \setminus \{j\}}} \nu\left( \sum_{i=1}^k (j-i)x_i^{(\omega_i)} \right)^{n_{j,\omega}} \right] = 1 + o(1)

for all exponents {n_{j,w} \in \{0,1\}}. This is just the previous condition with the natural {\mu} induced by {\nu}.

The natural generalization of relative Szemerédi is then:

Theorem 10 (Relative Szemerédi)

Suppose {k \ge 3}, and {\nu : \mathbb Z/n \rightarrow \mathbb R_{\ge 0}} satisfies the {k}-linear forms condition. Let {f : \mathbb Z/N to \mathbb R_{\ge 0}} with {\mathbf E[f] \ge \delta}, {f \le \nu}. Then

\displaystyle \Lambda_k(f, \dots, f) \ge \Omega_{\delta}(1).

3. Outline of proof of Relative Szemerédi

The proof of Relative Szeremédi uses two key facts. First, one replaces {f} with a bounded {\widetilde f} which is near it:

Theorem 11 (Dense model)

Let {\varepsilon > 0}. There exists {\varepsilon' > 0} such that if:

  • {\nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}} satisfies {\left\lVert \nu-1 \right\rVert^{\square}_r \le \varepsilon'}, and
  • {f : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}}, {f \le \nu}

then there exists a function {\widetilde f : \mathbb Z/N \rightarrow [0,1]} such that {\left\lVert f - \widetilde f \right\rVert^{\square}_r \le \varepsilon}.

Here we have a new norm, called the cut norm, defined by

\displaystyle \left\lVert f \right\rVert^{\square}_r = \sup_{A_i \subseteq (\mathbb Z/N)^{r-1}} \left\lvert \mathbf E_{x_1, \dots, x_r} f(x_1 + \dots + x_r) 1_{A_1}(x_{-1}) \dots 1_{A_r}(x_{-r}) \right\rvert.

This is actually an extension of the cut norm defined on a {r}-uniform {r}-partite hypergraph (not {(r-1)}-uniform like before!): if {g : X_1 \times \dots \times X_r \rightarrow \mathbb R} is such a graph, we let

\displaystyle \left\lVert g \right\rVert^{\square}_{r,r} = \sup_{A_i \subseteq X_{-i}} \left\lvert g(x_1, \dots, x_r) 1_{A_1}(x_{-1}) \dots 1_{A_r}(x_{-r}) \right\rvert.

Taking {g(x_1, \dots, x_r) = f(x_1 + \dots + x_r)}, {X_1 = \dots = X_r = \mathbb Z/N} gives the analogy.

For the second theorem, we define the norm

\displaystyle \left\lVert g \right\rVert^{\square}_{k-1,k} = \max_{i=1,\dots,k} \left( \left\lVert g_{-i} \right\rVert^{\square}_{k-1, k-1} \right).

Theorem 12 (Relative simplex counting lemma)

Let {\mu}, {g}, {\widetilde g} be weighted {(k-1)}-uniform {k}-partite weighted hypergraphs on {X_1 \cup \dots \cup X_k}. Assume that {\mu} satisfies the {k}-linear forms condition, and {0 \le g_{-i} \le \mu_{-i}} for all {i}, {0 \le \widetilde g \le 1}. If {\left\lVert g-\widetilde g \right\rVert^{\square}_{k-1,k} = o(1)} then

\displaystyle \mathbf E_{x_1, \dots, x_k} \left[ g(x_{-1}) \dots g(x_{-k}) - \widetilde g(x_{-1}) \dots \widetilde g(x_{-k}) \right] = o(1).

One then combines these two results to prove Szemerédi, as follows. Start with {f} and {\nu} in the theorem. The {k}-linear forms condition turns out to imply {\left\lVert \nu-1 \right\rVert^{\square}_{k-1} = o(1)}. So we can find a nearby {\widetilde f} by the dense model theorem. Then, we induce {\nu}, {g}, {\widetilde g} from {\mu}, {f}, {\widetilde f} respectively. The counting lemma then reduce the bounding of {\Lambda_k(f, \dots, f)} to the bounding of {\Lambda_k(\widetilde f, \dots, \widetilde f)}, which is {\Omega_\delta(1)} by the usual Szemerédi theorem.

4. Arithmetic progressions in primes

We now sketch how to obtain Green-Tao from Relative Szemerédi. As expected, we need to us the von Mangoldt function {\Lambda}.

Unfortunately, {\Lambda} is biased (e.g. “all decent primes are odd”). To get around this, we let {w = w(N)} tend to infinity slowly with {N}, and define

\displaystyle W = \prod_{p \le w} p.

In the {W}-trick we consider only primes {1 \pmod W}. The modified von Mangoldt function then is defined by

\displaystyle \widetilde \Lambda(n) = \begin{cases} \frac{\varphi(W)}{W} \log (Wn+1) & Wn+1 \text{ prime} \\ 0 & \text{else}. \end{cases}

In accordance with Dirichlet, we have {\sum_{n \le N} \widetilde \Lambda(n) = N + o(N)}.

So, we need to show now that

Proposition 13

Fix {k \ge 3}. We can find {\delta = \delta(k) > 0} such that for {N \gg 1} prime, we can find {\nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}} which satisfies the {k}-linear forms condition as well as

\displaystyle \nu(n) \ge \delta \widetilde \Lambda(n)

for {N/2 \le n < N}.

In that case, we can let

\displaystyle f(n) = \begin{cases} \delta \widetilde\Lambda(n) & N/2 \le n < N \\ 0 & \text{else}. \end{cases}

Then {0 \le f \le \nu}. The presence of {N/2 \le n < N} allows us to avoid “wrap-around issues” that arise from using {\mathbb Z/N} instead of {\mathbb Z}. Relative Szemerédi then yields the result.

For completeness, we state the construction. Let {\chi : \mathbb R \rightarrow [0,1]} be supported on {[-1,1]} with {\chi(0) = 1}, and define a normalizing constant {c_\chi = \int_0^\infty \left\lvert \chi'(x) \right\rvert^2 \; dx}. Inspired by {\Lambda(n) = \sum_{d \mid n} \mu(d) \log(n/d)}, we define a truncated {\Lambda} by

\displaystyle \Lambda_{\chi, R}(n) = \log R \sum_{d \mid n} \mu(d) \chi\left( \frac{\log d}{\log R} \right).

Let {k \ge 3}, {R = N^{k^{-1} 2^{-k-3}}}. Now, we define {\nu} by

\displaystyle \nu(n) = \begin{cases} \dfrac{\varphi(W)}{W} \dfrac{\Lambda_{\chi,R}(Wn+1)^2}{c_\chi \log R} & N/2 \le n < N \\ 0 & \text{else}. \end{cases}

This turns out to work, provided {w} grows sufficiently slowly in {N}.

Formal vs Functional Series (OR: Generating Function Voodoo Magic)

Epistemic status: highly dubious. I found almost no literature doing anything quite like what follows, which unsettles me because it makes it likely that I’m overcomplicating things significantly.

1. Synopsis

Recently I was working on an elegant problem which was the original problem 6 for the 2015 International Math Olympiad, which reads as follows:


[IMO Shortlist 2015 Problem C6] Let {S} be a nonempty set of positive integers. We say that a positive integer {n} is clean if it has a unique representation as a sum of an odd number of distinct elements from {S}. Prove that there exist infinitely many positive integers that are not clean.

Proceeding by contradiction, one can prove (try it!) that in fact all sufficiently large integers have exactly one representation as a sum of an even subset of {S}. Then, the problem reduces to the following:


Show that if {s_1 < s_2 < \dots} is an increasing sequence of positive integers and {P(x)} is a nonzero polynomial then we cannot have

\displaystyle \prod_{j=1}^\infty (1 - x^{s_j}) = P(x)

as formal series.

To see this, note that all sufficiently large {x^N} have coefficient {1 + (-1) = 0}. Now, the intuitive idea is obvious: the root {1} appears with finite multiplicity in {P} so we can put {P(x) = (1-x)^k Q(x)} where {Q(1) \neq 0}, and then we get that {1-x} on the RHS divides {P} too many times, right?

Well, there are some obvious issues with this “proof”: for example, consider the equality

\displaystyle 1 = (1-x)(1+x)(1+x^2)(1+x^4)(1+x^8) \dots.

The right-hand side is “divisible” by {1-x}, but the left-hand side is not (as a polynomial).

But we still want to use the idea of plugging {x \rightarrow 1^-}, so what is the right thing to do? It turns out that this is a complete minefield, and there are a lot of very subtle distinctions that seem to not be explicitly mentioned in many places. I think I have a complete answer now, but it’s long enough to warrant this entire blog post.

Here’s the short version: there’s actually two distinct notions of “generating function”, namely a “formal series” and “functional series”. They use exactly the same notation but are two different types of objects, and this ends up being the source of lots of errors, because “formal series” do not allow substituting {x}, while “functional series” do.

Spoiler: we’ll need the asymptotic for the partition function {p(n)}.

2. Formal Series {\neq} Functional Series

I’m assuming you’ve all heard the definition of {\sum_k c_kx^k}. It turns out unfortunately that this isn’t everything: there are actually two types of objects at play here. They are usually called formal power series and power series, but for this post I will use the more descriptive names formal series and functional series. I’ll do everything over {\mathbb C}, but one can of course use {\mathbb R} instead.

The formal series is easier to describe:

Definition 1

A formal series {F} is an infinite sequence {(a_n)_n = (a_0, a_1, a_2, \dots)} of complex numbers. We often denote it by {\sum a_nx^n = a_0 + a_1x + a_2x^2 + \dots}. The set of formal series is denoted {\mathbb C[ [x] ]}.

This is the “algebraic” viewpoint: it’s a sequence of coefficients. Note that there is no worry about convergence issues or “plugging in {x}”.

On the other hand, a functional series is more involved, because it has to support substitution of values of {x} and worry about convergence issues. So here are the necessary pieces of data:

Definition 2

A functional series {G} (centered at zero) is a function {G : U \rightarrow \mathbb C}, where {U} is an open disk centered at {0} or {U = \mathbb C}. We require that there exists an infinite sequence {(c_0, c_1, c_2, \dots)} of complex numbers satisfying

\displaystyle \forall z \in U: \qquad G(z) = \lim_{N \rightarrow \infty} \left( \sum_{k=0}^N c_k z^k \right).

(The limit is take in the usual metric of {\mathbb C}.) In that case, the {c_i} are unique and called the coefficients of {G}.

This is often written as {G(x) = \sum_n c_n x^n}, with the open set {U} suppressed.

Remark 3

Some remarks on the definition of functional series:

  • This is enough to imply that {G} is holomorphic (and thus analytic) on {U}.
  • For experts: note that I’m including the domain {U} as part of the data required to specify {G}, which makes the presentation cleaner. Most sources do something with “radius of convergence”; I will blissfully ignore this, leaving this data implicitly captured by {U}.
  • For experts: Perhaps non-standard, {U \neq \{0\}}. Otherwise I can’t take derivatives, etc.

Thus formal and functional series, despite having the same notation, have different types: a formal series {F} is a sequence, while a functional series {G} is a function that happens to be expressible as an infinite sum within its domain.

Of course, from every functional series {G} we can extract its coefficients and make them into a formal series {F}. So, for lack of better notation:

Definition 4

If {F = (a_n)_n} is a formal series, and {G : U \rightarrow \mathbb C} is a functional series whose coefficients equal {F}, then we write {F \simeq G}.

3. Finite operations

Now that we have formal and functional series, we can define sums. Since these are different types of objects, we will have to run definitions in parallel and then ideally check that they respect {\simeq}.

For formal series:

Definition 5

Let {F_1 = (a_n)_n} and {F_2 = (b_n)_n} be formal series. Then we set

\displaystyle \begin{aligned} (a_n)_n \pm (b_n)_n &= (a_n \pm b_n)_n \\ (a_n)_n \cdot (b_n)_n &= \left( \textstyle\sum_{j=0}^n a_jb_{n-j} \right)_n. \end{aligned}

This makes {\mathbb C[ [x] ]} into a ring, with identity {(0,0,0,\dots)} and {(1,0,0,\dots)}.

We also define the derivative {F = (a_n)_n} by {F' = ((n+1)a_{n+1})_n}.

It’s probably more intuitive to write these definitions as

\displaystyle \begin{aligned} \sum_n a_n x^n \pm \sum_n b_n x^n &= \sum_n (a_n \pm b_n) x^n \\ \left( \sum_n a_n x^n \right) \left( \sum_n b_n x^n \right) &= \sum_n \left( \sum_{j=0}^n a_jb_{n-j} \right) x^n \\ \left( \sum_n a_n x^n \right)' &= \sum_n na_n x^{n-1} \end{aligned}

and in what follows I’ll start to use {\sum_n a_nx^n} more. But officially, all definitions for formal series are in terms of the coefficients alone; these presence of {x} serves as motivation only.

Exercise 6

Show that if {F = \sum_n a_nx^n} is a formal series, then it has a multiplicative inverse if and only if {a_0 \neq 0}.

On the other hand, with functional series, the above operations are even simpler:

Definition 7

Let {G_1 : U \rightarrow \mathbb C} and {G_2 : U \rightarrow \mathbb C} be functional series with the same domain {U}. Then {G_1 \pm G_2} and {G_1 \cdot G_2} are defined pointwise.

If {G : U \rightarrow \mathbb C} is a functional series (hence holomorphic), then {G'} is defined poinwise.

If {G} is nonvanishing on {U}, then {1/G : U \rightarrow \mathbb C} is defined pointwise (and otherwise is not defined).

Now, for these finite operations, everything works as you expect:

Theorem 8 (Compatibility of finite operations)

Suppose {F}, {F_1}, {F_2} are formal series, and {G}, {G_1}, {G_2} are functional series {U \rightarrow \mathbb C}. Assume {F \simeq G}, {F_1 \simeq G_1}, {F_2 \simeq G_2}.

  • {F_1 \pm F_2 \simeq G_1 \pm G_2}, {F_1 \cdot F_2 = G_1 \cdot G_2}.
  • {F' \simeq G'}.
  • If {1/G} is defined, then {1/F} is defined and {1/F \simeq 1/G}.

So far so good: as long as we’re doing finite operations. But once we step beyond that, things begin to go haywire.

4. Limits

We need to start considering limits of {(F_k)_k} and {(G_k)_k}, since we are trying to make progress towards infinite sums and products. Once we do this, things start to burn.

Definition 9

Let {F_1 = \sum_n a_n x^n} and {F_2 = \sum_n b_n x^n} be formal series, and define the difference by

\displaystyle d(F_1, F_2) = \begin{cases} 2^{-n} & a_n \neq b_n, \; n \text{ minimal} \\ 0 & F_1 = F_2. \end{cases}

This function makes {\mathbb C[[x]]} into a metric space, so we can discuss limits in this space. Actually, it is a normed vector space obtained by {\left\lVert F \right\rVert = d(F,0)} above.

Thus, {\lim_{k \rightarrow \infty} F_k = F} if each coefficient of {x^n} eventually stabilizes as {k \rightarrow \infty}. For example, as formal series we have that {(1,-1,0,0,\dots)}, {(1,0,-1,0,\dots)}, {(1,0,0,-1,\dots)} converges to {1 = (1,0,0,0\dots)}, which we write as

\displaystyle \lim_{k \rightarrow \infty} (1 - x^k) = 1 \qquad \text{as formal series}.

As for functional series, since they are functions on the same open set {U}, we can use pointwise convergence or the stronger uniform convergence; we’ll say explicitly which one we’re doing.

Example 10 (Limits don’t work at all)

In what follows, {F_k \simeq G_k} for every {k}.

  • Here is an example showing that if {\lim_k F_k = F}, the functions {G_k} may not converge even pointwise. Indeed, just take {F_k = 1 - x^k} as before, and let {U = \{ z : |z| < 2 \}}.
  • Here is an example showing that even if {G_k \rightarrow G} uniformly, {\lim_k F_k} may not exist. Take {G_k = 1 - 1/k} as constant functions. Then {G_k \rightarrow 1}, but {\lim_k F_k} doesn’t exist because the constant term never stabilizes (in the combinatorial sense).
  • The following example from this math.SE answer by Robert Israel shows that it’s possible that {F = \lim_k F_k} exists, and {G_k \rightarrow G} pointwise, and still {F \not\simeq G}. Let {U} be the open unit disk, and set

    \displaystyle \begin{aligned} A_k &= \{z = r e^{i\theta} \mid 2/k \le r \le 1, \; 0 \le \theta \le 2\pi - 1/k\} \\ B_k &= \left\{ |z| \le 1/k \right\} \end{aligned}

    for {k \ge 1}. By Runge theorem there’s a polynomial {p_k(z)} such that

    \displaystyle |p_k(z) - 1/z^{k}| < 1/k \text{ on } A_k \qquad \text{and} \qquad |p_k(z)| < 1/k \text{ on }B_k.


    \displaystyle G_k(z) = z^{k+1} p_k(z)

    is the desired counterexample (with {F_k} being the sequence of coefficients from {G}). Indeed by construction {\lim_k F_k = 0}, since {\left\lVert F_k \right\rVert \le 2^{-k}} for each {k}. Alas, {|g_k(z) - z| \le 2/k} for {z \in A_k \cup B_k}, so {G_k \rightarrow z} converges pointwise to the identity function.

To be fair, we do have the following saving grace:

Theorem 11 (Uniform convergence and both limits exist is sufficient)

Suppose that {G_k \rightarrow G} converges uniformly. Then if {F_k \simeq G_k} for every {k}, and {\lim_k F_k = F}, then {F \simeq G}.

Proof: Here is a proof, copied from this math.SE answer by Joey Zhou. WLOG {G = 0}, and let {g_n(z) = \sum{a^{(n)}_kz^k}}. It suffices to show that {a_k = 0} for all {k}. Choose any {0<r<1}. By Cauchy’s integral formula, we have

\displaystyle \begin{aligned} \left|a_k - a^{(n)}_k\right| &= \left|\frac{1}{2\pi i} \int\limits_{|z|=r}{\frac{g(z)-g_n(z)}{z^{n+1}}\text{ d}z}\right| \\ & \le\frac{1}{2\pi}(2\pi r)\frac{1}{r^{n+1}}\max\limits_{|z|=r}{|g(z)-g_n(z)|} \xrightarrow{n\rightarrow\infty} 0 \end{aligned}

since {g_n} converges uniformly to {g} on {U}. Hence, {a_k = \lim\limits_{n\rightarrow\infty}{a^{(n)}_k}}. Since {a^{(n)}_k = 0} for {n\ge k}, the result follows. \Box

The take-away from this section is that limits are relatively poorly behaved.

5. Infinite sums and products

Naturally, infinite sums and products are defined by taking the limit of partial sums and limits. The following example (from math.SE again) shows the nuances of this behavior.

Example 12 (On {e^{1+x}})

The expression

\displaystyle \sum_{n=0}^\infty \frac{(1+x)^n}{n!} = \lim_{N \rightarrow \infty} \sum_{n=0}^N \frac{(1+x)^n}{n!}

does not make sense as a formal series: we observe that for every {N} the constant term of the partial sum changes.

But this does converge (uniformly, even) to a functional series on {U = \mathbb C}, namely to {e^{1+x}}.

Exercise 13

Let {(F_k)_{k \ge 1}} be formal series.

  • Show that an infinite sum {\sum_{k=1}^\infty F_k(x)} converges as formal series exactly when {\lim_k \left\lVert F_k \right\rVert = 0}.
  • Assume for convenience {F_k(0) = 1} for each {k}. Show that an infinite product {\prod_{k=0}^{\infty} (1+F_k)} converges as formal series exactly when {\lim_k \left\lVert F_k-1 \right\rVert = 0}.

Now the upshot is that one example of a convergent formal sum is the expression {\lim_{N} \sum_{n=0}^N a_nx^n} itself! This means we can use standard “radius of convergence” arguments to transfer a formal series into functional one.

Theorem 14 (Constructing {G} from {F})

Let {F = \sum a_nx^n} be a formal series and let

\displaystyle r = \frac{1}{\limsup_n \sqrt[n]{|c_n|}}.

If {r > 0} then there exists a functional series {G} on {U = \{ |z| < r \}} such that {F \simeq G}.

Proof: Let {F_k} and {G_k} be the corresponding partial sums of {c_0x^0} to {c_kx^k}. Then by Cauchy-Hadamard theorem, we have {G_k \rightarrow G} uniformly on (compact subsets of) {U}. Also, {\lim_k F_k = F} by construction. \Box

This works less well with products: for example we have

\displaystyle 1 \equiv (1-x) \prod_{j \ge 0} (1+x^{2^j})

as formal series, but we can’t “plug in {x=1}”, for example,

6. Finishing the original problem

We finally return to the original problem: we wish to show that the equality

\displaystyle P(x) = \prod_{j=1}^\infty (1 - x^{s_j})

cannot hold as formal series. We know that tacitly, this just means

\displaystyle \lim_{N \rightarrow \infty} \prod_{j=1}^N\left( 1 - x^{s_j} \right) = P(x)

as formal series.

Here is a solution obtained only by only considering coefficients, presented by Qiaochu Yuan from this MathOverflow question.

Both sides have constant coefficient {1}, so we may invert them; thus it suffices to show we cannot have

\displaystyle \frac{1}{P(x)} = \frac{1}{\prod_{j=1}^{\infty} (1 - x^{s_j})}

as formal power series.

The coefficients on the LHS have asymptotic growth a polynomial times an exponential.

On the other hand, the coefficients of the RHS can be shown to have growth both strictly larger than any polynomial (by truncating the product) and strictly smaller than any exponential (by comparing to the growth rate in the case where {s_j = j}, which gives the partition function {p(n)} mentioned before). So the two rates of growth can’t match.

18.099 Transcript: Bourgain’s Theorem

As part of the 18.099 Discrete Analysis reading group at MIT, I presented section 4.7 of Tao-Vu’s Additive Combinatorics textbook. Here were the notes I used for the second half of my presentation.

1. Synopsis

We aim to prove the following result.

Theorem 1 (Bourgain)

Assume {N \ge 2} is prime and {A, B \subseteq Z = \mathbb Z_N}. Assume that

\displaystyle  \delta \gg \sqrt{\frac{(\log \log N)^3}{\log N}}

is such that {\min\left\{ \mathbf P_ZA, \mathbf P_ZB \right\} \ge \delta}. Then {A+B} contains a proper arithmetic progression of length at least

\displaystyle  \exp\left( C\sqrt[3]{\delta^2 \log N} \right)

for some absolute constant {C > 1}.

The methods that we used with Bohr sets fail here, because in the previous half of yesterday’s lecture we took advantage of Parseval’s identity in order to handle large convolutions, always keeping two {\widehat 1_\ast} term’s inside the {\sum} sign. When we work with {A+B} this causes us to be stuck. So, we instead use the technology of {\Lambda(p)} constants and dissociated sets.

2. Previous results

As usual, let {Z} denote a finite abelian group. Recall that

Definition 2

Let {S \subseteq Z} and {2 \le p \le \infty}. The {\Lambda(p)} constant of {S}, denoted {\left\lVert S \right\rVert_{\Lambda(p)}}, is defined as

\displaystyle  \left\lVert S \right\rVert_{\Lambda(p)} = \sup_{\substack{c : S \rightarrow \mathbb C \\ c \not\equiv 0}} \frac{\left\lVert \displaystyle\sum_{\xi \in S} c(\xi) e(\xi \cdot x) \right\rVert_{L^p(Z)}} {\left\lVert c \right\rVert_{\ell^2(S)}}.

Definition 3

If {S \subseteq Z}, we say {S} is a dissociated set if all {2^{|S|}} subset sums of {S} are distinct.

For such sets we have the Rudin’s inequality (yes, Walter) which states that

Lemma 4 (Rudin’s inequality)

If {S} is dissociated then

\displaystyle  \left\lVert S \right\rVert_{\Lambda(p)} \ll \sqrt p.

Disassociated sets come up via the so-called “cube covering lemma”:

Lemma 5 (Cube covering lemma)

Let {S \subseteq Z} and {d \ge 1}. Then we can partition

\displaystyle  S = D_1 \sqcup D_2 \sqcup \dots \sqcup D_k \sqcup R

such that

  • Each {D_i} is dissociated of size {d+1},
  • There exists {\eta_1}, {\dots}, {\eta_d} such that {R} is contained in a {d}-cube, i.e. it’s covered by {c_1\eta_1 + \dots + c_d\eta_d}, where {c_i \in \{-1,0,1\}}.

Finally, we remind the reader that

Lemma 6 (Parseval)

We have

\displaystyle  \left\lVert f \right\rVert_{L^2Z} = \left\lVert \widehat f \right\rVert_{\ell^2Z}.

Since we don’t have Bohr sets anymore, the way we detect progressions is to use the pigeonhole principle. In what follows, let {T^n f} be the shift of {x} by {n}, id est {T^nf(x) = f(x-n)}.

Proposition 7 (Pigeonhole gives arithmetic progressions)

Let {f : Z \rightarrow \mathbb R_{\ge 0}}, {J \ge 1} and suppose {r \in \mathbb Z} is such that

\displaystyle  \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr}f - f \right\rvert < \mathbf E_Z f.

Then {\text{supp }(f)} contains an arithmetic of length {j} and spacing {r}.

Proof: Apply the pigeonhole principle to find an {x} such that

\displaystyle  \max_{1 \le j \le J} \left\lvert T^{jr}f(x) - f(x) \right\rvert < f(x).

Then the claim follows. \Box

3. Periodicity

Proposition 8 (Estimate for {\max_{h \in H} |T^hf|} for {\text{supp }(\widehat f)} dissociated)

Let {f : Z \rightarrow \mathbb R}, {\text{supp }(\widehat f) \subseteq S \subseteq Z} with {S} dissociated. Then for any set {H} with {|H| > 1} we have

\displaystyle  \left\lVert \max_{h \in H} \left\lvert T^h f \right\rvert \right\rVert_{L^2Z} \ll \sqrt{\log|H|} \left\lVert f \right\rVert_{L^2Z}.

Proof: Let {p > 2} be large and note

\displaystyle  \begin{aligned} \left\lVert \max_{h \in H} \left\lvert T^h f \right\rvert \right\rVert_{L^2Z} &\le \left\lVert \max_{h \in H} \left\lvert T^h f \right\rvert \right\rVert_{L^pZ} \\ &\le \left\lVert \left( \sum_{h \in H} \left\lvert T^h f \right\rvert^p \right)^{1/p} \right\rVert_{L^pZ} \\ &= \left( \mathbf E_Z \left( \sum_{h \in H} \left\lvert T^h f \right\rvert^p \right) \right)^{1/p} \\ &= \left( \sum_{h \in H} \mathbf E_Z \left\lvert T^h f \right\rvert^p \right)^{1/p} \\ &= \left( \sum_{h \in H} \mathbf E_Z \left\lvert f \right\rvert^p \right)^{1/p} \\ &= \left\lvert H \right\rvert^{1/p} \left\lVert \sum_\xi \widehat f(\xi) e(\xi \cdot x) \right\rVert_{L^pZ} \\ &\le \left\lvert H \right\rvert^{1/p} \left\lVert S \right\rVert_{\Lambda(p)} \left\lVert \widehat f \right\rVert_{\ell^2Z} \\ \end{aligned}

Then by Parseval and Rudin,

\displaystyle  \begin{aligned} \left\lVert \max_{h \in H} \left\lvert T^h f \right\rvert \right\rVert_{L^2Z} &\le \left\lvert H \right\rvert^{1/p} \left\lVert S \right\rVert_{\Lambda(p)} \left\lVert f \right\rVert_{L^2Z} \\ &\ll \left\lvert H \right\rvert^{1/p} \sqrt p \left\lVert f \right\rVert_{L^2Z}. \end{aligned}

We may then take {p \ll \log H}. \Box

We combine these two propositions into the following lemma which applies if {\widehat f} has nonzero values of “uniform” size.

Lemma 9 (Uniformity estimate for shifts)

Let {f : Z \rightarrow \mathbb R} and {J, d > 1}. Suppose that {\widehat f} is “uniform in size” across its support, in the sense that

\displaystyle  \frac {\sup_{\xi \in \text{supp }(\widehat f)} \left\lvert \widehat f(\xi) \right\rvert} {\inf_{\xi \in \text{supp }(\widehat f)} \left\lvert \widehat f(\xi) \right\rvert} \le 2016.

Then one can find {S \subseteq Z} such that {|S| = d} and for all {r \in Z},

\displaystyle  \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr}f - f \right\rvert \ll \left( \sum_\xi \left\lvert \widehat f(\xi) \right\rvert \right) \left( \sqrt{\frac{\log J}{d}} + Jd\max_{\eta \in S} \left\lVert \eta \cdot r \right\rVert_{\mathbb R/\mathbb Z} \right).

Proof: Use the cube covering lemma to put {\text{supp }(\widehat f) = D_1 \sqcup \dots \sqcup D_k \sqcup R} where {R} is contained in the cube of {S = \left\{ \eta_1, \dots, \eta_d \right\}} and {|D_i| = d+1} for {1 \le i \le k}. Accordingly, we decompose {f} over its Fourier transform as

\displaystyle  f = f_1 + \dots + f_k + g

by letting {f_i} be supported on {D_i} and {g(x)} supported on {R}.

First, we can bound the “leftover” bits in {R}:

\displaystyle  \begin{aligned} \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr} g - g \right\rvert &= \mathbf E_Z \max_{0 \le j \le J} \sum_{\xi \in R} \left\lvert \widehat f(\xi) \cdot (e(\xi \cdot (x+jr)) - e(\xi \cdot x)) \right\rvert \\ &\le \mathbf E_Z \max_{0 \le j \le J} \sum_{\xi \in R} \left\lvert \widehat f(\xi) \right\rvert \left\lvert (e(\xi \cdot (x+jr)) - e(\xi \cdot x)) \right\rvert \\ &\le \left( \sum_{\xi \in R} \left\lvert \widehat f(\xi) \right\rvert \right) \max_{\substack{0 \le j \le J \\ \xi \in R}} \left\lvert (e(\xi \cdot (x+jr)) - e(\xi \cdot x)) \right\rvert \\ &\le \left( \sum_{\xi \in R} \left\lvert \widehat f(\xi) \right\rvert \right) \max_{\substack{0 \le j \le J \\ \xi \in R}} \left\lvert e(\xi \cdot jr) - 1 \right\rvert \\ &\le \left( \sum_{\xi \in R} \left\lvert \widehat f(\xi) \right\rvert \right) 2\pi \max_{\substack{0 \le j \le J \\ \xi \in R}} \left\lVert \xi \cdot jr \right\rVert_{\mathbb R/\mathbb Z} \end{aligned}

Since the {\xi \in R} are covered by a cube of {S = \left\{ \eta_1, \dots, \eta_d \right\}}, we get

\displaystyle  \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr} g - g \right\rvert \le \left( \sum_{\xi \in R} \left\lvert \widehat f(\xi) \right\rvert \right) 2\pi Jd \max_{\substack{0 \le j \le J \\ \eta \in S}} \left\lVert \eta \cdot jr \right\rVert_{\mathbb R/\mathbb Z}.

Let’s then bound the contribution over each dissociated set. We’ll need both the assumption of uniformity and the proposition we proved for dissociated sets.

\displaystyle  \begin{aligned} \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr} f_i - f_i \right\rvert &\le 2\mathbf E_Z \max_{0 \le j \le J} \left\lvert T^{jr} f_i \right\rvert \\ &\le 2\left\lVert \max_{0 \le j \le J} \left\lvert T^{jr} f_i \right\rvert \right\rVert_{L^2Z}. \\ &\ll \sqrt{\log(J)} \left\lVert f_i \right\rVert_{L^2Z} \\ &= \sqrt{\log(J)} \sqrt{\sum_{\xi \in D_i} \left\lvert \widehat f(\xi) \right\rvert^2 } \\ &\ll \sqrt{\frac{\log J}{D}} \sum_{\xi \in D_i} \left\lvert \widehat f(\xi) \right\rvert \end{aligned}

where the last step is by uniformity of {\widehat \xi}. Now combine everything with triangle inequality. \Box

4. Proof of main theorem

Without loss of generality {\mathbf P_ZA = \mathbf P_ZB = \delta}. Of course, we let {f = 1_A \ast 1_B} so {\mathbf E_Z f = \delta^2}. We will have parameters {d \ge 1}, {M \ge 1}, and {J \ge \exp(C\sqrt[3]{\delta^2 \log N})} which we will select at the end.

Our goal is to show there exists some integer {r} such that

\displaystyle  \mathbf E_Z \max_{1 \le j < J} \left\lvert T^{jr} f - f \right\rvert < \delta^2.

Now we cannot apply the uniformity estimate directly since {f} is probably not uniform, and therefore we impose a dyadic decomposition on the base group {Z}; let

\displaystyle  \begin{aligned} Z_0 &= \left\{ \xi \in Z \;:\; \frac{1}{2} \delta^2 < \left\lvert \widehat f(\xi) \right\rvert \le \delta^2 \right\} \\ Z_1 &= \left\{ \xi \in Z \;:\; \frac14\delta^2 < \left\lvert \widehat f(\xi) \right\rvert \le \frac{1}{2}\delta^2 \right\} \\ Z_2 &= \left\{ \xi \in Z \;:\; \frac18\delta^2 < \left\lvert \widehat f(\xi) \right\rvert \le \frac14\delta^2 \right\} \\ &\vdots \\ Z_{M-1} &= \left\{ \xi \in Z \;:\; 2^{-M} \delta^2 < \left\lvert \widehat f(\xi) \right\rvert \le 2^{-M+1} \delta^2 \right\} \\ Z_{\mathrm{err}} &= \left\{ \xi \in Z \;:\; \left\lvert \widehat f(\xi) \right\rvert < 2^{-M} \delta^2 \right\} \\ \end{aligned}

Then as before we can decompose via Fourier transform to obtain

\displaystyle  f = f_0 + f_1 + \dots + f_{M-1} + f_{\mathrm{err}}

so that {\widehat f_i} is supported on {Z_i}.

Now we can apply the previous lemma to get for each {0 \le m < M}:

\displaystyle  \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr} f_m - f_m \right\rvert \ll \left( \sum_{\xi \in Z_m} \left\lvert \widehat f(\xi) \right\rvert \right) \left( \sqrt{\frac{\log J}{d}} + Jd\max_{\eta \in S_m} \left\lVert \eta \cdot r \right\rVert_{\mathbb R/\mathbb Z} \right)

for some {S_m}; hence by summing and using the fact that

\displaystyle  \sum_{\xi \in Z} \left\lvert \widehat f(\xi) \right\rvert = \sum_{\xi \in Z} \left\lvert \widehat 1_A(\xi) \right\rvert \left\lvert \widehat 1_B(\xi) \right\rvert \le \left\lVert \widehat 1_A \right\rVert_{\ell^2Z} \left\lVert \widehat 1_B \right\rVert_{\ell^2Z} = \left\lVert 1_A \right\rVert_{L^2Z} \left\lVert 1_B \right\rVert_{L^2Z} = \sqrt{\mathbf P_ZA \mathbf P_ZB} = \delta

we obtain that

\displaystyle  \sum_{0 \le m < M} \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr} f - f \right\rvert \ll \delta \left( \sqrt{\frac{\log J}{d}} + Jd\max_{\eta \in \bigcup S_m} \left\lVert \eta \cdot r \right\rVert_{\mathbb R/\mathbb Z} \right).

As for the “error” term, we bound

\displaystyle  \begin{aligned} \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr} f_{\mathrm{err}} - f_{\mathrm{err}} \right\rvert &\le 2\mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr} f_{\mathrm{err}} \right\rvert \\ &\le 2\mathbf E_Z \sum_{1 \le j \le J} \left\lvert T^{jr} f_{\mathrm{err}} \right\rvert \\ &\le 2\sum_{1 \le j \le J} \mathbf E_Z \left\lvert T^{jr} f_{\mathrm{err}} \right\rvert \\ &\le 2\sum_{1 \le j \le J} \mathbf E_Z \left\lvert f_{\mathrm{err}} \right\rvert \\ &\le 2J \mathbf E_Z \left\lvert f_{\mathrm{err}} \right\rvert \\ &\le 2J \left\lVert f_{\mathrm{err}} \right\rVert_{L^2Z} \\ &= 2J \left\lVert \widehat f_{\mathrm{err}} \right\rVert_{\ell^2 Z} \\ &= 2J \sqrt{\sum_{\xi \in Z_{\mathrm{err}}} \left\lvert \widehat f_{\mathrm{err}}(\xi) \right\rvert^2} \\ &\le 2J \sqrt{\max_{\xi \in Z_{\mathrm{err}}} \left\lvert \widehat f_{\mathrm{err}}(\xi) \right\rvert \sum_{\xi \in Z_{\mathrm{err}}} \left\lvert \widehat f_{\mathrm{err}}(\xi) \right\rvert} \\ &\le 2J \sqrt{2^{-M}\delta^2 \cdot \delta} \\ &= 2J 2^{-M/2} \delta^{3/2} \\ &\le 2J 2^{-M/2} \delta. \end{aligned}

Thus, putting these altogether we need to find {R \neq 0} such that

\displaystyle  \sqrt{\frac{\log J}{d}} + Jd \max_{\eta\in\bigcup S_m} \left\lVert \eta \cdot r \right\rVert_{\mathbb R/\mathbb Z} + 2J \cdot2^{-M/2} \ll \delta.

Now set {M \asymp \log J} and {d \asymp \delta^{-2} \log J}, so the first and third terms are less than {\frac13 c \delta}, since by hypothesis

\displaystyle  \delta \gg \sqrt{\frac{(\log \log N)^3}{\log N}}

from which we deduce

\displaystyle  J \gg \exp\left( C\sqrt[3]{\delta^2\log N} \right) = \exp\left( C\log \log N \right) \ge (\log N)^C \gg \delta^{-1}.

Thus it suffices that

\displaystyle  \max_{\eta\in S} \left\lVert \eta \cdot r \right\rVert_{\mathbb R/\mathbb Z} \ll \frac{\delta^3}{J \log J}

where {S = \bigcup S_m}. Note {\left\lvert S \right\rvert \le dM \ll \left( \frac{\log J}{\delta} \right)^2}. Now we recall the result that

\displaystyle  \text{Bohr }(S, \rho) \ge |Z| \rho^{|S|}

and so it suffices for us that

\displaystyle  N \cdot \left( \frac{c_1 \delta^3}{J \log J} \right) ^{c_2 \left( \delta^{-1} \log J \right)^2} > 1

for constants {c_1} and {c_2}. Then {J = \exp(C\sqrt[3]{\delta^2 \log N})} works now.

18.099 Transcript: Chang’s Theorem

As part of the 18.099 discrete analysis reading group at MIT, I presented section 4.7 of Tao-Vu’s Additive Combinatorics textbook. Here were the notes I used for the first part of my presentation.

1. Synopsis

In the previous few lectures we’ve worked hard at developing the notion of characters, Bohr sets, spectrums. Today we put this all together to prove some Szemerédi-style results on arithmetic progressions of {\mathbb Z_N}.

Recall that Szemerédi’s Theorem states that:

Theorem 1 (Szemerédi)

Let {k \ge 3} be an integer. Then for sufficiently large {N}, any subset of {\{1, \dots, N\}} with density at least

\displaystyle  \frac{1}{(\log \log N)^{2^{-2^k+9}}}

contains a length {k} arithmetic progression.

Notice that the density approaches zero as {N \rightarrow \infty} but it does so extremely slowly.

Our goal is to show much better results for sets like {2A-2A}, {A+B+C} or {A+B}. In this post we will prove:

Theorem 2 (Chang’s Theorem)

Let {K,N \ge 1} and let {A \subseteq Z = \mathbb Z_N}. Suppose {E(A,A) \ge |A|^3 / K}, and let

\displaystyle  d \ll K\left( 1+\log \frac{1}{\mathbf P_Z A} \right).

Then there is a proper symmetric progression {P \subseteq 2A-2A} of rank at most {d} and density

\displaystyle  \mathbf P_Z P \ge d^{-d}.

One can pick {K} such that for example {|A \pm A| \le k|A|}, i.e. if {A} has small Ruzsa diameter. Or one can pick {K = 1/\mathbf P_Z A} always, but then {d} becomes quite large.

We also prove that

Theorem 3

Let {K,N \ge 1} and let {A, B, C \subseteq Z = \mathbb Z_N}. Suppose {|A|=|B|=|C| \ge \frac{1}{K}|A+B+C|} and now let

\displaystyle  d \ll K^2\left( 1+\log \frac{1}{\mathbf P_Z A} \right).

Then there is a proper symmetric progression {P \subseteq A+B+C} of rank at most {d} and

\displaystyle  \mathbf P_Z P \ge d^{-d}.

2. Main steps

Our strategy will take the following form. Let {S} be the set we want to study (for us, {S=2A-2A} or {S=A+B+C}). Then our strategy will take the following four steps.

Step 1. Analyze the Fourier coefficients of {\widehat 1_S}. Note in particular the identities

\displaystyle  \begin{aligned} \left\lVert \widehat 1_A \right\rVert_{\ell^\infty(Z)} &= \mathbf P_Z A \\ \left\lVert \widehat 1_A \right\rVert_{\ell^2(Z)} &= \sqrt{\mathbf P_Z A} \\ \left\lVert \widehat 1_A \right\rVert_{\ell^4(Z)} &= \frac{E(A,A)}{|Z|^3}. \end{aligned}

Recall also from the first section of Chapter 4 that

  • The support of {1_A \ast 1_B} is {A+B}.
  • {\widehat{f \ast g} = \widehat f \cdot \widehat g}.
  • {f(x) = \sum_\xi \widehat f(\xi) e(\xi \cdot x)}.

Step 2. Find a set of the form {\text{Bohr }(\text{Spec }_\alpha A, \rho)} contained completely inside {S}. Recall that by expanding definitions:

\displaystyle  \text{Bohr }(\text{Spec }_\alpha A, \rho) = \left\{ x \in Z \mid \sup_{\xi \; : \; \widehat 1_A(\xi) \ge \alpha \mathbf P_ZA} \left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z} < \rho \right\}.

Step 3. Use the triangle inequality and the Fourier concentration lemma (covering). Recall that this says:

Lemma 4 (Fourier Concentration, or “Covering Lemma”, Tao-Vu 4.36)

Let {A \subseteq Z}, and let {0 < \alpha \le 1}. Then one can pick {\eta_1}, \dots, {\eta_d} such that

\displaystyle  d \ll \frac{1 + \log \frac{1}{\mathbf P_ZA}}{\alpha^2}

and {\text{Spec }_\alpha A} is contained in a {d}-cube, i.e. it’s covered by {c_1\eta_1 + \dots + c_d\eta_d} where {c_i \in \{-1,0,1\}}.

Using such a {d}, we have by the triangle inequality

\displaystyle  \text{Bohr }\left(\{\eta_1, \dots, \eta_d\}, \frac{\rho}{d} \right) \subseteq \text{Bohr }\left( \text{Spec }_\alpha A, \rho \right).  \ \ \ \ \ (1)

Step 4. We use the fact that Bohr sets contain long arithmetic progressions:

Theorem 5 (Bohr sets have long coset progressions, Tao-Vu 4.23)

Let {Z = \mathbb Z_N}. Then within {\text{Bohr }(S, r)} one can select a proper symmetric progression {P} such that

\displaystyle  \mathbf P_Z P \ge \left( \frac{r}{|S|} \right)^{|S|}

and {\text{rank } P \le |S|}.

The third step is necessary because in the bound for the preceding theorem, the dependence on {|S|} is much more severe than the dependence on {r}. Therefore it is necessary to use the Fourier concentration lemma in order to reduce the size of {|S|} before applying the result.

3. Proof of Chang’s theorem

First, we do the first two steps in the following proposition.

Proposition 6

Let {A \subseteq Z}, {0 < \alpha \le 1}. Assume {E(A,A) \ge 4\alpha^2 |A|^3}, Then

\displaystyle  \text{Bohr }\left(\text{Spec }_\alpha A, \frac 16\right) \subseteq 2A-2A.

Proof: To do this, as advertised consider

\displaystyle  f = 1_A \ast 1_A \ast 1_{-A} \ast 1_{-A}(x).

We want to show that any {x \in \text{Bohr }(\text{Spec }_\alpha A, \frac 16)} lies in the support of {f}. Note that if {x} does lie in this Bohr set, we have

\displaystyle  \text{Re } e(\xi \cdot x) \ge \frac{1}{2} \qquad \forall \xi \in \text{Spec }_\alpha A.

We aim to show now {f(x) > 0}. This follows by computing

\displaystyle  \begin{aligned} f(x) &= 1_A \ast 1_A \ast 1_{-A} \ast 1_{-A}(x) \\ &= \sum_\xi \widehat 1_A(\xi)^2 \widehat 1_{-A}(\xi)^2 e(\xi \cdot x) \\ &= \sum_\xi |\widehat 1_A(\xi)|^4 e(\xi \cdot x) \end{aligned}

Now we split the sum over {\text{Spec }_\alpha A}:

\displaystyle  \begin{aligned} f(x) &= \sum_{\xi \in \text{Spec }_\alpha(A)} |\widehat 1_A(\xi)|^4 e(\xi \cdot x) + \sum_{\xi \notin \text{Spec }_\alpha(A)} |\widehat 1_A(\xi)|^4 e(\xi \cdot x). \end{aligned}

Now we take the real part of both sides:

\displaystyle  \begin{aligned} \text{Re } f(x) &\ge \sum_{\xi \in \text{Spec }_\alpha(A)} |\widehat 1_A(\xi)|^4 \cdot \frac{1}{2} - \sum_{\xi \notin \text{Spec }_\alpha(A)} |\widehat 1_A(\xi)|^4 \\ &= \frac{1}{2} \sum_{\xi} |\widehat 1_A(\xi)|^4 - \frac32 \sum_{\xi \notin \text{Spec }_\alpha(A)} |\widehat 1_A(\xi)|^4 \\ &= \frac{1}{2} \frac{E(A,A)}{|Z|^3} - \frac32 \sum_{\xi \notin \text{Spec }_\alpha(A)} |\widehat 1_A(\xi)|^4. \end{aligned}

By definition of {\text{Spec }_\alpha A} we can bound two of the {\left\lvert \widehat 1_A(\xi) \right\rvert}‘s via

\displaystyle  \begin{aligned} \text{Re } f(x) &\ge \frac{1}{2} \frac{E(A,A)}{|Z|^3} - \frac32 (\alpha\mathbf P_Z A)^2 \sum_{\xi \notin \text{Spec }_\alpha(A)} |\widehat 1_A(\xi)|^2 \end{aligned}

Now the last sum is the square of the {\ell^2} norm, hence

\displaystyle  \begin{aligned} \text{Re } f(x) &\ge \frac{1}{2} \frac{E(A,A)}{|Z|^3} - \frac32 (\alpha\mathbf P_Z A)^2 \cdot \mathbf P_ZA \\ &\ge \frac{1}{2} \frac{E(A,A)}{|Z|^3} - \frac32 \alpha^2 \frac{|A|^3}{|Z|^3} > 0 \end{aligned}

by the assumption {E(A,A) \ge 4\alpha^2 |A|^3}. \Box

Now, let {\alpha = \frac{1}{2\sqrt K}}, and let

\displaystyle  d \ll \frac{1 + \log \frac{1}{\mathbf P_Z A}}{\alpha^2} \ll K\left( 1 + \log \frac{1}{\mathbf P_Z A} \right).

Then by (1), we have

\displaystyle  \text{Bohr }\left(\{\eta_1, \dots, \eta_d\}, \frac{1}{6d} \right) \subseteq \text{Bohr }\left( \text{Spec }_\alpha A, \frac16 \right) 2A-2A.

and then using the main result on Bohr sets, we can find a symmetric progression of density at least

\displaystyle  \left( \frac{1/6d}{d} \right)^d = d^{-d}

and whose rank is at most {d}. This completes the proof of Chang’s theorem.

4. Proof of the second theorem

This time, the Bohr set we want to use is:

Proposition 7

Let {\alpha = \frac{1}{2\pi K}}. Then

\displaystyle  \text{Bohr }\left(\text{Spec }_\alpha A, \frac{1}{2\pi K}\right) \subseteq A+B+C.

Proof: Let {f = 1_A \ast 1_B \ast 1_C}. Note that we have {\mathbf P_Z(A+B+C) \le K\mathbf P_Z A}, while {\mathbf E_ZA = (\mathbf P_ZA)^3}. So by shifting {C}, we may assume without loss of generality that

\displaystyle  f(0) \ge \frac{(\mathbf P_ZA)^3}{K\mathbf P_ZA} \ge \frac{1}{K} (\mathbf P_ZA)^2.

Now, consider {x} in the Bohr set. Then we have

\displaystyle  \begin{aligned} \left\lvert f(x)-f(0) \right\rvert &= \left\lvert \sum_\xi \widehat1_A(\xi) \widehat1_B(\xi) \widehat1_C(\xi) \left( e(\xi \cdot x) - 1 \right) \right\rvert \\ &\le \sum_\xi \left\lvert \widehat 1_A(\xi) \right\rvert \left\lvert \widehat 1_B(\xi) \right\rvert \left\lvert \widehat 1_C(\xi) \right\rvert \left\lvert e(\xi \cdot x) - 1 \right\rvert \\ &\le 2\pi \sum_\xi \left\lvert \widehat 1_A(\xi) \right\rvert \left\lvert \widehat 1_B(\xi) \right\rvert \left\lvert \widehat 1_C(\xi) \right\rvert \left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z}. \end{aligned}

Bounding by the maximum for {A}, and then using Cauchy-Schwarz,

\displaystyle  \begin{aligned} \left\lvert f(x)-f(0) \right\rvert &\le 2\pi \sup_\xi \left( \left\lvert \widehat 1_A(\xi) \right\rvert \left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z} \right) \sum_\xi \left\lvert \widehat 1_B(\xi) \right\rvert \left\lvert \widehat 1_C(\xi) \right\rvert \\ &\le 2\pi \sup_\xi \left( \left\lvert \widehat 1_A(\xi) \right\rvert \left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z} \right) \sqrt{ \sum_\xi \left\lvert \widehat 1_B(\xi) \right\rvert^2 \sum_\xi \left\lvert \widehat 1_C(\xi) \right\rvert^2} \\ &\le 2\pi \mathbf P_Z A \cdot \sup_\xi \left( \left\lvert \widehat 1_A(\xi) \right\rvert \left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z} \right) \end{aligned}

Claim: if {x \in \text{Bohr }(\text{Spec }_\alpha A, \frac{1}{2\pi K})} and {\xi \in Z} then

\displaystyle  \left\lvert \widehat 1_A(\xi) \right\rvert \left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z} < \frac{1}{2\pi K} \mathbf P_ZA

Indeed one just considers two cases:

  • If {\xi \in \text{Spec }_\alpha A}, then {\left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z} < \alpha} ({x} in Bohr set) and {\left\lvert \widehat1_A(\xi) \right\rvert \le \mathbf P_ZA}.
  • If {\xi \notin \text{Spec }_\alpha A}, then {\left\lvert \widehat 1_A(\xi) \right\rvert < \alpha \mathbf P_ZA} ({\xi} outside Spec) and {\left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z} \le 1}.

So finally, we have

\displaystyle  \left\lvert f(x)-f(0) \right\rvert < 2\pi \mathbf P_Z A \cdot \sup_\xi \left( \left\lvert \widehat 1_A(\xi) \right\rvert \right) < \frac{(\mathbf P_ZA)^2}{K} \le f(0)

and this implies {f(x) \neq 0}. \Box

Once more by (1), we have

\displaystyle  \text{Bohr }\left(\{\eta_1, \dots, \eta_d\}, \frac{1}{2\pi Kd} \right) \subseteq \text{Bohr }\left( \text{Spec }_\alpha A, \frac{1}{2\pi K} \right) \subseteq A+B+C


\displaystyle  d \ll \frac{1+\log \frac{1}{\mathbf P_ZA}}{\alpha^2} \ll K^2\left( 1 + \log \frac{1}{\mathbf P_Z A} \right).

So there are the main theorem on Bohr sets again, there is a symmetric progression of density at least

\displaystyle  \left( \frac{\frac{1}{2\pi Kd}}{d} \right)^d \ll d^{-d}

and rank at most {d}. This completes the proof of the second theorem.