Combinatorial Nullstellensatz and List Coloring

More than six months late, but here are notes from the combinatorial nullsetllensatz talk I gave at the student colloquium at MIT. This was also my term paper for 18.434, “Seminar in Theoretical Computer Science”.

1. Introducing the choice number

One of the most fundamental problems in graph theory is that of a graph coloring, in which one assigns a color to every vertex of a graph so that no two adjacent vertices have the same color. The most basic invariant related to the graph coloring is the chromatic number:

Definition 1

A simple graph {G} is {k}-colorable if it’s possible to properly color its vertices with {k} colors. The smallest such {k} is the chromatic number {\chi(G)}.

In this exposition we study a more general notion in which the set of permitted colors is different for each vertex, as long as at least {k} colors are listed at each vertex. This leads to the notion of a so-called choice number, which was introduced by Erdös, Rubin, and Taylor.

Definition 2

A simple graph {G} is {k}-choosable if its possible to properly color its vertices given a list of {k} colors at each vertex. The smallest such {k} is the choice number {\mathop{\mathrm{ch}}(G)}.

Example 3

We have {\mathop{\mathrm{ch}}(C_{2n}) = \chi(C_{2n}) = 2} for any integer {n} (here {C_{2n}} is the cycle graph on {2n} vertices). To see this, we only have to show that given a list of two colors at each vertex of {C_{2n}}, we can select one of them.

  • If the list of colors is the same at each vertex, then since {C_{2n}} is bipartite, we are done.
  • Otherwise, suppose adjacent vertices {v_1}, {v_{2n}} are such that some color at {c} is not in the list at {v_{2n}}. Select {c} at {v_1}, and then greedily color in {v_2}, \dots, {v_{2n}} in that order.

We are thus naturally interested in how the choice number and the chromatic number are related. Of course we always have

\displaystyle \mathop{\mathrm{ch}}(G) \ge \chi(G).

Näively one might expect that we in fact have an equality, since allowing the colors at vertices to be different seems like it should make the graph easier to color. However, the following example shows that this is not the case.

Example 4 (Erdös)

Let {n \ge 1} be an integer and define

\displaystyle G = K_{n^n, n}.

We claim that for any integer {n \ge 1} we have

\displaystyle \mathop{\mathrm{ch}}(G) \ge n+1 \quad\text{and}\quad \chi(G) = 2.

The latter equality follows from {G} being partite.

Now to see the first inequality, let {G} have vertex set {U \cup V}, where {U} is the set of functions {u : [n] \rightarrow [n]} and {V = [n]}. Then consider {n^2} colors {C_{i,j}} for {1 \le i, j \le n}. On a vertex {u \in U}, we list colors {C_{1,u(1)}}, {C_{2,u(2)}}, \dots, {C_{n,u(n)}}. On a vertex {v \in V}, we list colors {C_{v,1}}, {C_{v,2}}, \dots, {C_{v,n}}. By construction it is impossible to properly color {G} with these colors.

The case {n = 3} is illustrated in the figure below (image in public domain).


This surprising behavior is the subject of much research: how can we bound the choice number of a graph as a function of its chromatic number and other properties of the graph? We see that the above example requires exponentially many vertices in {n}.

Theorem 5 (Noel, West, Wu, Zhu)

If {G} is a graph with {n} vertices then

\displaystyle \chi(G) \le \mathop{\mathrm{ch}}(G) \le \max\left( \chi(G), \left\lceil \frac{\chi(G)+n-1}{3} \right\rceil \right).

In particular, if {n \le 2\chi(G)+1} then {\mathop{\mathrm{ch}}(G) = \chi(G)}.

One of the most major open problems in this direction is the following.

Definition 6

A claw-free graph is a graph with no induced {K_{3,1}}. For example, the line graph (also called edge graph) of any simple graph {G} is claw-free.

If {G} is a claw-free graph, then {\mathop{\mathrm{ch}}(G) = \chi(G)}. In particular, this conjecture implies that for edge coloring, the notions of “chromatic number” and “choice number” coincide.


In this exposition, we prove the following result of Alon.

Theorem 7 (Alon)

A bipartite graph {G} is {\left\lfloor L(G) \right\rfloor+1} choosable, where

\displaystyle L(G) \overset{\mathrm{def}}{=} \max_{H \subseteq G} |E(H)|/|V(H)|

is half the maximum of the average degree of subgraphs {H}.

In particular, recall that a planar bipartite graph {H} with {r} vertices contains at most {2r-4} edges. Thus for such graphs we have {L(G) \le 2} and deduce:

Corollary 8

A planar bipartite graph is {3}-choosable.

This corollary is sharp, as it applies to {K_{2,4}} which we have seen in Example 4 has {\mathop{\mathrm{ch}}(K_{2,4}) = 3}.

The rest of the paper is divided as follows. First, we begin in §2 by stating Theorem 9, the famous combinatorial nullstellensatz of Alon. Then in §3 and §4, we provide descriptions of the so-called graph polynomial, to which we then apply combinatorial nullstellensatz to deduce Theorem 18. Finally in §5, we show how to use Theorem 18 to prove Theorem 7.

2. Combinatorial Nullstellensatz

The main tool we use is the Combinatorial Nullestellensatz of Alon.

Theorem 9 (Combinatorial Nullstellensatz)

Let {F} be a field, and let {f \in F[x_1, \dots, x_n]} be a polynomial of degree {t_1 + \dots + t_n}. Let {S_1, S_2, \dots, S_n \subseteq F} such that {\left\lvert S_i \right\rvert > t_i} for all {i}.

Assume the coefficient of {x_1^{t_1}x_2^{t_2}\dots x_n^{t_n}} of {f} is not zero. Then we can pick {s_1 \in S_1}, \dots, {s_n \in S_n} such that

\displaystyle f(s_1, s_2, \dots, s_n) \neq 0.

Example 10

Let us give a second proof that

\displaystyle \mathop{\mathrm{ch}}(C_{2n}) = 2

for every positive integer {n}. Our proof will be an application of the Nullstellensatz.

Regard the colors as real numbers, and let {S_i} be the set of colors at vertex {i} (hence {1 \le i \le 2n}, and {|S_i| = 2}). Consider the polynomial

\displaystyle f = \left( x_1-x_2 \right)\left( x_2-x_3 \right) \dots \left( x_{2n-1}-x_{2n} \right)\left( x_{2n}-x_1 \right)

The coefficient of {x_1^1 x_2^1 \dots x_{2n}^1} is {2 \neq 0}. Therefore, one can select a color from each {S_i} so that {f} does not vanish.

3. The Graph Polynomial, and Directed Orientations

Motivated by Example 10, we wish to apply a similar technique to general graphs {G}. So in what follows, let {G} be a (simple) graph with vertex set {\{1, \dots, n\}}.

Definition 11

The graph polynomial of {G} is defined by

\displaystyle f_G(x_1, \dots, x_n) = \prod_{\substack{(i,j) \in E(G) \\ i < j}} (x_i-x_j).

We observe that coefficients of {f_G} correspond to differences in directed orientations. To be precise, we introduce the notation:

Definition 12

Consider orientations on the graph {G} with vertex set {\{1, \dots, n\}}, meaning we assign a direction {v \rightarrow w} to every edge of {G} to make it into a directed graph {G}. An oriented edge is called ascending if {v \rightarrow w} and {v \le w}, i.e. the edge points from the smaller number to the larger one.

Then we say that an orientation is

  • even if there are an even number of ascending edges, and
  • odd if there are an odd number of ascending edges.

Finally, we define

  • {\mathop{\mathrm{DE}}_G(d_1, \dots, d_n)} to the be set of all even orientations of {G} in which vertex {i} has indegree {d_i}.
  • {\mathop{\mathrm{DO}}_G(d_1, \dots, d_n)} to the be set of all odd orientations of {G} in which vertex {i} has indegree {d_i}.

Set {\mathop{\mathrm{D}}_G(d_1,\dots,d_n) = \mathop{\mathrm{DE}}_G(d_1,\dots,d_n) \cup \mathop{\mathrm{DO}}_G(d_1,\dots,d_n)}.

Example 13

Consider the following orientation:

even-orientationThere are exactly two ascending edges, namely {1 \rightarrow 2} and {2 \rightarrow 4}. The indegrees of are {d_1 = 0}, {d_2 = 2} and {d_3 = d_4 = 1}. Therefore, this particular orientation is an element of {\mathop{\mathrm{DE}}_G(0,2,1,1)}. In terms of {f_G}, this corresponds to the choice of terms

\displaystyle \left( x_1- \boldsymbol{x_2} \right) \left( \boldsymbol{x_2}-x_3 \right) \left( x_2-\boldsymbol{x_4} \right) \left( \boldsymbol{x_3}-x_4 \right)

which is a {+ x_2^2 x_3 x_4} term.

Lemma 14

In the graph polynomial of {G}, the coefficient of {x_1^{d_1} \dots x_n^{d_n}} is

\displaystyle \left\lvert \mathop{\mathrm{DE}}_G(d_1, \dots, d_n) \right\rvert - \left\lvert \mathop{\mathrm{DO}}_G(d_1, \dots, d_n) \right\rvert.

Proof: Consider expanding {f_G}. Then each expanded term corresponds to a choice of {x_i} or {x_j} from each {(i,j)}, as in Example 13. The term has coefficient {+1} is the orientation is even, and {-1} if the orientation is odd, as desired. \Box

Thus we have an explicit combinatorial description of the coefficients in the graph polynomial {f_G}.

4. Coefficients via Eulerian Suborientations

We now give a second description of the coefficients of {f_G}.

Definition 15

Let {D \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)}, viewed as a directed graph. An Eulerian suborientation of {D} is a subgraph of {D} (not necessarily induced) in which every vertex has equal indegree and outdegree. We say that such a suborientation is

  • even if it has an even number of edges, and
  • odd if it has an odd number of edges.

Note that the empty suborientation is allowed. We denote the even and odd Eulerian suborientations of {D} by {\mathop{\mathrm{EE}}(D)} and {\mathop{\mathrm{EO}}(D)}, respectively.

Eulerian suborientations are brought into the picture by the following lemma.

Lemma 16

Assume {D \in \mathop{\mathrm{DE}}_G(d_1, \dots, d_n)}. Then there are natural bijections

\displaystyle \begin{aligned} \mathop{\mathrm{DE}}_G(d_1, \dots, d_n) &\rightarrow \mathop{\mathrm{EE}}(D) \\ \mathop{\mathrm{DO}}_G(d_1, \dots, d_n) &\rightarrow \mathop{\mathrm{EO}}(D). \end{aligned}

Similarly, if {D \in \mathop{\mathrm{DO}}_G(d_1, \dots, d_n)} then there are bijections

\displaystyle \begin{aligned} \mathop{\mathrm{DE}}_G(d_1, \dots, d_n) &\rightarrow \mathop{\mathrm{EO}}(D) \\ \mathop{\mathrm{DO}}_G(d_1, \dots, d_n) &\rightarrow \mathop{\mathrm{EE}}(D). \end{aligned}

Proof: Consider any orientation {D' \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)}, Then we define a suborietation of {D}, denoted {D \rtimes D'}, by including exactly the edges of {D} whose orientation in {D'} is in the opposite direction. It’s easy to see that this induces a bijection

\displaystyle D \rtimes - : \mathop{\mathrm{D}}_G(d_1, \dots, d_n) \rightarrow \mathop{\mathrm{EE}}(D) \cup \mathop{\mathrm{EO}}(D)

Moreover, remark that

  • {D \rtimes D'} is even if {D} and {D'} are either both even or both odd, and
  • {D \rtimes D'} is odd otherwise.

The lemma follows from this. \Box

Corollary 17

In the graph polynomial of {G}, the coefficient of {x_1^{d_1} \dots x_n^{d_n}} is

\displaystyle \pm \left( \left\lvert \mathop{\mathrm{EE}}(D) \right\rvert - \left\lvert \mathop{\mathrm{EO}}(D) \right\rvert \right)

where {D \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)} is arbitrary.

Proof: Combine Lemma 14 and Lemma 16. \Box

We now arrive at the main result:

Theorem 18

Let {G} be a graph on {\{1, \dots, n\}}, and let {D \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)} be an orientation of {G}. If {\left\lvert \mathop{\mathrm{EE}}(D) \right\rvert \neq \left\lvert \mathop{\mathrm{EO}}(D) \right\rvert}, then given a list of {d_i+1} colors at each vertex of {G}, there exists a proper coloring of the vertices of {G}.

In particular, {G} is {(1+\max_i d_i)}-choosable.

Proof: Combine Corollary 17 with Theorem 9. \Box

5. Finding an orientation

Armed with Theorem 18, we are almost ready to prove Theorem 7. The last ingredient is that we need to find an orientation on {G} in which the maximal degree is not too large. This is accomplished by the following.

Lemma 19

Let {L(G) \overset{\mathrm{def}}{=} \max_{H \subseteq G} |E(H)|/|V(H)|} as in Theorem 7. Then {G} has an orientation in which every indegree is at most {\left\lceil L(G) \right\rceil}.

Proof: This is an application of Hall’s marriage theorem.

Let {d = \left\lceil L(G) \right\rceil \ge L(G)}. Construct a bipartite graph

\displaystyle E \cup X \qquad \text{where}\qquad E = E(G) \quad\text{ and }\quad X = \underbrace{V(G) \sqcup \dots \sqcup V(G)}_{d \text{ times}}.

Connect {e \in E} and {v \in X} if {v} is an endpoint of {e}. Since {d \ge L(G)} we satisfy Hall’s condition (as {L(G)} is a condition for all subgraphs {H \subseteq G}) and can match each edge in {E} to a (copy of some) vertex in {X}. Since there are exactly {d} copies of each vertex in {X}, the conclusion follows. \Box

Now we can prove Theorem 7. Proof: According to Lemma 19, pick {D \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)} where {\max d_i \le \left\lceil L(G) \right\rceil}. Since {G} is bipartite, we obviously have {\mathop{\mathrm{EO}}(D) = \varnothing}, since {G} cannot have any odd cycles. So Theorem 18 applies and we are done. \Box

Vinogradov’s Three-Prime Theorem (with Sammy Luo and Ryan Alweiss)

This was my final paper for 18.099, seminar in discrete analysis, jointly with Sammy Luo and Ryan Alweiss.

We prove that every sufficiently large odd integer can be written as the sum of three primes, conditioned on a strong form of the prime number theorem.

1. Introduction

In this paper, we prove the following result:

Theorem 1 (Vinogradov)

Every sufficiently large odd integer {N} is the sum of three prime numbers.

In fact, the following result is also true, called the “weak Goldbach conjecture”.

Theorem 2 (Weak Goldbach conjecture)

Every odd integer {N \ge 7} is the sum of three prime numbers.

The proof of Vinogradov’s theorem becomes significantly simpler if one assumes the generalized Riemann hypothesis; this allows one to use a strong form of the prime number theorem (Theorem 9). This conditional proof was given by Hardy and Littlewood in the 1923’s. In 1997, Deshouillers, Effinger, te Riele and Zinoviev showed that the generalized Riemann hypothesis in fact also implies the weak Goldbach conjecture by improving the bound to {10^{20}} and then exhausting the remaining cases via a computer search.

As for unconditional proofs, Vinogradov was able to eliminate the dependency on the generalized Riemann hypothesis in 1937, which is why the Theorem 1 bears his name. However, Vinogradov’s bound used the ineffective Siegel-Walfisz theorem; his student K. Borozdin showed that {3^{3^{15}}} is large enough. Over the years the bound was improved, until recently in 2013 when Harald Helfgott claimed the first unconditional proof of Theorem 2, see here.

In this exposition we follow Hardy and Littlewood’s approach, i.e. we prove Theorem 1 assuming the generalized Riemann hypothesis, following the exposition of Rhee. An exposition of the unconditional proof by Vinogradov is given by Rouse.

2. Synopsis

We are going to prove that

\displaystyle  	\sum_{a+b+c = N} \Lambda(a) \Lambda(b) \Lambda(c) \asymp \frac12 N^2 \mathfrak G(N) 	 \ \ \ \ \ (1)


\displaystyle  \mathfrak G(N) 	\overset{\text{def}}{=} \prod_{p \mid N} \left( 1 - \frac{1}{(p-1)^2} \right) 	\prod_{p \nmid N} \left( 1 + \frac{1}{(p-1)^3} \right)

and {\Lambda} is the von Mangoldt function defined as usual. Then so long as {2 \nmid N}, the quantity {\mathfrak G(N)} will be bounded away from zero; thus (1) will imply that in fact there are many ways to write {N} as the sum of three distinct prime numbers.

The sum (1) is estimated using Fourier analysis. Let us define the following.

Definition 3

Let {\mathbb T = \mathbb R/\mathbb Z} denote the circle group, and let {e : \mathbb T \rightarrow \mathbb C} be the exponential function {\theta \mapsto \exp(2\pi i \theta)}. For {\alpha\in\mathbb T}, {\|\alpha\|} denotes the minimal distance from {\alpha} to an integer.

Note that {|e(\theta)-1|=\Theta(\|\theta\|)}.

Definition 4

For {\alpha \in \mathbb T} and {x > 0} we define

\displaystyle  S(x, \alpha) = \sum_{n \le x} \Lambda(n) e(n\alpha).

Then we can rewrite (1) using {S} as a “Fourier coefficient”:

Proposition 5

We have

\displaystyle  		\sum_{a+b+c = N} \Lambda(a) \Lambda(b) \Lambda(c) 		= \int_{\alpha \in \mathbb T} S(N, \alpha)^3 e(-N\alpha) \; d\alpha. 		 	\ \ \ \ \ (2)

Proof: We have

\displaystyle S(N,\alpha)^3=\sum_{a,b,c\leq N}\Lambda(a)\Lambda(b)\Lambda(c)e((a+b+c)\alpha),


\displaystyle  \begin{aligned} \int_{\alpha \in \mathbb T} S(N, \alpha)^3 e(-N\alpha) \; d\alpha &= \int_{\alpha \in \mathbb T} \sum_{a,b,c\leq N}\Lambda(a)\Lambda(b)\Lambda(c)e((a+b+c)\alpha) e(-N\alpha) \; d\alpha \\ &= \sum_{a,b,c\leq N}\Lambda(a)\Lambda(b)\Lambda(c)\int_{\alpha \in \mathbb T}e((a+b+c-N)\alpha) \; d\alpha \\ &= \sum_{a,b,c\leq N}\Lambda(a)\Lambda(b)\Lambda(c)I(a+b+c=N) \\ &= \sum_{a+b+c=N}\Lambda(a)\Lambda(b)\Lambda(c), \end{aligned}

as claimed. \Box

In order to estimate the integral in Proposition 5, we divide {\mathbb T} into the so-called “major” and “minor” arcs. Roughly,

  • The “major arcs” are subintervals of {\mathbb T} centered at a rational number with small denominator.
  • The “minor arcs” are the remaining intervals.

These will be made more precise later. This general method is called the Hardy-Littlewood circle method, because of the integral over the circle group {\mathbb T}.

The rest of the paper is structured as follows. In Section 3, we define the Dirichlet character and other number-theoretic objects, and state some estimates for the partial sums of these objects conditioned on the Riemann hypothesis. These bounds are then used in Section 4 to provide corresponding estimates on {S(x, \alpha)}. In Section 5 we then define the major and minor arcs rigorously and use the previous estimates to given an upper bound for the integral over both areas. Finally, we complete the proof in Section 6.

3. Prime number theorem type bounds

In this section, we collect the necessary number-theoretic results that we will need. It is in this section only that we will require the generalized Riemann hypothesis.

As a reminder, the notation {f(x)\ll g(x)}, where {f} is a complex function and {g} a nonnegative real one, means {f(x)=O(g(x))}, a statement about the magnitude of {f}. Likewise, {f(x)=g(x)+O(h(x))} simply means that for some {C}, {|f(x)-g(x)|\leq C|h(x)|} for all sufficiently large {x}.

3.1. Dirichlet characters

In what follows, {q} denotes a positive integer.

Definition 6

A Dirichlet character modulo {q} {\chi} is a homomorphism {\chi : (\mathbb Z/q)^\times \rightarrow \mathbb C^\times}. It is said to be trivial if {\chi = 1}; we denote this character by {\chi_0}.

By slight abuse of notation, we will also consider {\chi} as a function {\mathbb Z \rightarrow \mathbb C^\ast} by setting {\chi(n) = \chi(n \pmod q)} for {\gcd(n,q) = 1} and {\chi(n) = 0} for {\gcd(n,q) > 1}.

Remark 7

The Dirichlet characters form a multiplicative group of order {\phi(q)} under multiplication, with inverse given by complex conjugation. Note that {\chi(m)} is a primitive {\phi(q)}th root of unity for any {m \in (\mathbb Z/q)^\times}, thus {\chi} takes values in the unit circle.

Moreover, the Dirichlet characters satisfy an orthogonality relation

Experts may recognize that the Dirichlet characters are just the elements of the Pontryagin dual of {(\mathbb Z/q)^\times}. In particular, they satisfy an orthogonality relationship

\displaystyle  	\frac{1}{\phi(q)} 	\sum_{\chi \text{ mod } q} \chi(n) \overline{\chi(a)} 	= \begin{cases} 		1 & n = a \pmod q \\ 		0 & \text{otherwise} 	\end{cases} 	 \ \ \ \ \ (3)

and thus form an orthonormal basis for functions {(\mathbb Z/q)^\times \rightarrow \mathbb C}.

3.2. Prime number theorem for arithmetic progressions

Definition 8

The generalized Chebyshev function is defined by

\displaystyle  \psi(x, \chi) = \sum_{n \le x} \Lambda(n) \chi(n).

The Chebyshev function is studied extensively in analytic number theory, as it is the most convenient way to phrase the major results of analytic number theory. For example, the prime number theorem is equivalent to the assertion that

\displaystyle  \psi(x, \chi_0) = \sum_{n \le x} \Lambda(n) \asymp x

where {q = 1} (thus {\chi_0} is the constant function {1}). Similarly, Dirichlet’s theorem actually asserts that any {q \ge 1},

\displaystyle  	\psi(x, \chi) 	= \begin{cases} 		x + o_q(x) & \chi = \chi_0 \text{ trivial} \\ 		o_q(x) & \chi \neq \chi_0 \text{ nontrivial}. 	\end{cases}

However, the error term in these estimates is quite poor (more than {x^{1-\varepsilon}} for every {\varepsilon}). However, by assuming the Riemann Hypothesis for a certain “{L}-function” attached to {\chi}, we can improve the error terms substantially.

Theorem 9 (Prime number theorem for arithmetic progressions)

Let {\chi} be a Dirichlet character modulo {q}, and assume the Riemann hypothesis for the {L}-function attached to {\chi}.

  1. If {\chi} is nontrivial, then

    \displaystyle  \psi(x, \chi) \ll \sqrt{x} (\log qx)^2.

  2. If {\chi = \chi_0} is trivial, then

    \displaystyle  \psi(x, \chi_0) = x + O\left( \sqrt x (\log x)^2 + \log q \log x \right).

Theorem 9 is the strong estimate that we will require when putting good estimates on {S(x, \alpha)}, and is the only place in which the generalized Riemann Hypothesis is actually required.

3.3. Gauss sums

Definition 10

For {\chi} a Dirichlet character modulo {q}, the Gauss sum {\tau(\chi)} is defined by

\displaystyle \tau(\chi)=\sum_{a=0}^{q-1}\chi(a)e(a/q).

We will need the following fact about Gauss sums.

Lemma 11

Consider Dirichlet characters modulo {q}. Then:

  1. We have {\tau(\chi_0) = \mu(q)}.
  2. For any {\chi} modulo {q}, {\left\lvert \tau(\chi) \right\rvert \le \sqrt q}.

3.4. Dirichlet approximation

We finally require Dirichlet approximation theorem in the following form.

Theorem 12 (Dirichlet approximation)

Let {\alpha \in \mathbb R} be arbitrary, and {M} a fixed integer. Then there exists integers {a} and {q = q(\alpha)}, with {1 \le q \le M} and {\gcd(a,q) = 1}, satisfying

\displaystyle  \left\lvert \alpha - \frac aq \right\rvert \le \frac{1}{qM}.

4. Bounds on {S(x, \alpha)}

In this section, we use our number-theoretic results to bound {S(x,\alpha)}.

First, we provide a bound for {S(x,\alpha)} if {\alpha} is a rational number with “small” denominator {q}.

Lemma 13

Let {\gcd(a,q) = 1}. Assuming Theorem 9, we have

\displaystyle  S(x, a/q) 		= \frac{\mu(q)}{\phi(q)} x + O\left( \sqrt{qx} (\log qx)^2 \right)

where {\mu} denotes the Möbius function.

Proof: Write the sum as

\displaystyle  S(x, a/q) = \sum_{n \le x} \Lambda(n) e(na/q).

First we claim that the terms {\gcd(n,q) > 1} (and {\Lambda(n) \neq 0}) contribute a negligibly small {\ll \log q \log x}. To see this, note that

  • The number {q} has {\ll \log q} distinct prime factors, and
  • If {p^e \mid q}, then {\Lambda(p) + \dots + \Lambda(p^e) 			= e\log p = \log(p^e) < \log x}.

So consider only terms with {\gcd(n,q) = 1}. To bound the sum, notice that

\displaystyle  \begin{aligned} 		e(n \cdot a/q) &= \sum_{b \text{ mod } q} e(b/q) \cdot \mathbf 1(b \equiv an) \\ 		&= \sum_{b \text{ mod } q} e(b/q) \left( \frac{1}{\phi(q)} 			\sum_{\chi \text{ mod } q} \chi(b) \overline{\chi(an)} \right) 	\end{aligned}

by the orthogonality relations. Now we swap the order of summation to obtain a Gauss sum:

\displaystyle  \begin{aligned} 		e(n \cdot a/q) &= \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \overline{\chi(an)} 			\left( \sum_{b \text{ mod } q} \chi(b) e(b/q) \right) \\ 		&= \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \overline{\chi(an)} \tau(\chi). 	\end{aligned}

Thus, we swap the order of summation to obtain that

\displaystyle  \begin{aligned} 		S(x, \alpha) &= \sum_{\substack{n \le x \\ \gcd(n,q) = 1}} 			\Lambda(n) e(n \cdot a/q) \\ 		&= \frac{1}{\phi(q)} \sum_{\substack{n \le x \\ \gcd(n,q) = 1}} 			\sum_{\chi \text{ mod } q} \Lambda(n) \overline{\chi(an)} \tau(\chi) \\ 		&= \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \tau(\chi) 			\sum_{\substack{n \le x \\ \gcd(n,q) = 1}} \Lambda(n) \overline{\chi(an)} \\ 		&= \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \overline{\chi(a)} \tau(\chi) 			\sum_{\substack{n \le x \\ \gcd(n,q) = 1}} \Lambda(n)\overline{\chi(n)} \\ 		&= \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \overline{\chi(a)} 			\tau(\chi) \psi(x, \overline\chi) \\ 		&= \frac{1}{\phi(q)} \left( \tau(\chi_0) \psi(x, \chi_0) 			+ \sum_{1 \neq \chi \text{ mod } q} \overline{\chi(a)} \tau(\chi) 			\psi(x, \overline\chi) \right). 	\end{aligned}

Now applying both parts of Lemma 11 in conjunction with Theorem 9 gives

\displaystyle  \begin{aligned} 		S(x,\alpha) 		&= \frac{\mu(q)}{\phi(q)} 			\left( x + O\left( \sqrt x (\log qx)^2 \right) \right) 			+ O\left( \sqrt x (\log x)^2 \right) \\ 		&= \frac{\mu(q)}{\phi(q)} x + O\left( \sqrt{qx} (\log qx)^2 \right) 	\end{aligned}

as desired. \Box

We then provide a bound when {\alpha} is “close to” such an {a/q}.

Lemma 14

Let {\gcd(a,q) = 1} and {\beta \in \mathbb T}. Assuming Theorem 9, we have

\displaystyle  		S(x, a/q + \beta) = 		\frac{\mu(q)}{\phi(q)} \left( \sum_{n \le x} e(\beta n) \right) 		+ O\left( (1+\|\beta\|x) \sqrt{qx} (\log qx)^2 \right).

Proof: For convenience let us assume {x \in \mathbb Z}. Let {\alpha = a/q + \beta}. Let us denote {\text{Err}(x, \alpha) 		= S(x,\alpha) - \frac{\mu(q)}{\phi(q)} x}, so by Lemma 13 we have {\text{Err}(x,\alpha) \ll \sqrt{qx}(\log x)^2}. We have

\displaystyle  \begin{aligned} 		S(x, \alpha) &= \sum_{n \le x} \Lambda(n) e(na/q) e(n\beta) \\ 		&= \sum_{n \le x} e(n\beta) \left( S(n, a/q) - S(n-1, a/q) \right) \\ 		&= \sum_{n \le x} e(n\beta) \left( 			\frac{\mu(q)}{\phi(q)} 			+ \text{Err}(n, \alpha) - \text{Err}(n-1, \alpha) \right) \\ 		&= \frac{\mu(q)}{\phi(q)} \left( \sum_{n \le x} e(n\beta) \right) 			+ \sum_{1 \le m \le x-1} \left( e( (m+1)\beta) - e( m\beta ) \right) 			\text{Err}(m, \alpha) \\ 		&\qquad + e(x\beta) \text{Err}(x, \alpha) - e(0) \text{Err}(0, \alpha) \\ 		&\le \frac{\mu(q)}{\phi(q)} \left( \sum_{n \le x} e(n\beta) \right) 			+ \left( \sum_{1 \le m \le x-1} \|\beta\| \text{Err}(m, \alpha) \right) 			+ \text{Err}(0, \alpha) + \text{Err}(x, \alpha) \\ 		&\ll \frac{\mu(q)}{\phi(q)} \left( \sum_{n \le x} e(n\beta) \right) 			+ \left( 1+x\left\| \beta \right\| \right) 			O\left( \sqrt{qx} (\log qx)^2 \right) 	\end{aligned}

as desired. \Box

Thus if {\alpha} is close to a fraction with small denominator, the value of {S(x, \alpha)} is bounded above. We can now combine this with the Dirichlet approximation theorem to obtain the following general result.

Corollary 15

Suppose {M = N^{2/3}} and suppose {\left\lvert \alpha - a/q \right\rvert \le \frac{1}{qM}} for some {\gcd(a,q) = 1} with {q \le M}. Assuming Theorem 9, we have

\displaystyle  S(x, \alpha) \ll \frac{x}{\varphi(q)} + x^{\frac56+\varepsilon}

for any {\varepsilon > 0}.

Proof: Apply Lemma 14 directly. \Box

5. Estimation of the arcs

We’ll write

\displaystyle  f(\alpha) \overset{\text{def}}{=} S(N,\alpha)=\sum_{n \le N} \Lambda(n)e(n\alpha)

for brevity in this section.

Recall that we wish to bound the right-hand side of (2) in Proposition 5. We split {[0,1]} into two sets, which we call the “major arcs” and the “minor arcs.” To do so, we use Dirichlet approximation, as hinted at earlier.

In what follows, fix

\displaystyle  \begin{aligned} 	M &= N^{2/3} \\ 	K &= (\log N)^{10}. \end{aligned}

5.1. Setting up the arcs

Definition 16

For {q \le K} and {\gcd(a,q) = 1}, {1 \le a \le q}, we define

\displaystyle  		\mathfrak M(a,q) = \left\{ \alpha \in \mathbb T 		\mid \left\lvert \alpha - \frac aq \right\rvert \le \frac 1M \right\}.

These will be the major arcs. The union of all major arcs is denoted by {\mathfrak M}. The complement is denoted by {\mathfrak m}.

Equivalently, for any {\alpha}, consider {q = q(\alpha) \le M} as in Theorem 12. Then {\alpha \in \mathfrak M} if {q \le K} and {\alpha \in \mathfrak m} otherwise.

Proposition 17

{\mathfrak M} is composed of finitely many disjoint intervals {\mathfrak M(a,q)} with {q \le K}. The complement {\mathfrak m} is nonempty.

Proof: Note that if {q_1, q_2 \le K} and {a/q_1 \neq b/q_2} then {\left\lvert \frac{a}{q_1} - \frac{b}{q_2} \right\rvert 	\ge \frac{1}{q_1q_2} \gg \frac{3}{qM}}. \Box

In particular both {\mathfrak M} and {\mathfrak m} are measurable. Thus we may split the integral in (2) over {\mathfrak M} and {\mathfrak m}. This integral will have large magnitude on the major arcs, and small magnitude on the minor arcs, so overall the whole interval {[0,1]} it will have large magnitude.

5.2. Estimate of the minor arcs

First, we note the well known fact {\phi(q) \gg q/\log q}. Note also that if {q=q(\alpha)} as in the last section and {\alpha} is on a minor arc, we have {q > (\log N)^{10}}, and thus {\phi(q) \gg (\log N)^{9}}.

As such Corollary 3.3 yields that {f(\alpha) \ll \frac{N}{\phi(q)}+N^{.834} \ll \frac{N}{(\log N)^9}}.


\displaystyle  \begin{aligned} 	\left\lvert \int_{\mathfrak m}f(\alpha)^3e(-N\alpha) \; d\alpha \right\rvert 	&\le \int_{\mathfrak m}\left\lvert f(\alpha)\right\rvert ^3 \; d\alpha \\ 	&\ll \frac{N}{(\log N)^9} \int_{0}^{1}\left\lvert f(\alpha)\right\rvert ^2 \;d\alpha \\ 	&=\frac{N}{(\log N)^9}\int_{0}^{1}f(\alpha)f(-\alpha) \; d\alpha \\ 	&=\frac{N}{(\log N)^9}\sum_{n \le N} \Lambda(n)^2 \\ 	&\ll \frac{N^2}{(\log N)^8}, \end{aligned}

using the well known bound {\sum_{n \le N} \Lambda(n)^2 \ll \frac{N}{\log N}}. This bound of {\frac{N^2}{(\log N)^8}} will be negligible compared to lower bounds for the major arcs in the next section.

5.3. Estimate on the major arcs

We show that

\displaystyle  \int_{\mathfrak M}f(\alpha)^3e(-N\alpha) d\alpha \asymp \frac{N^2}{2} \mathfrak G(N).

By Proposition 17 we can split the integral over each interval and write

\displaystyle  \int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha 	= \sum_{q \le (\log N)^{10}}\sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} 	\int_{-1/qM}^{1/qM}f(a/q+\beta)^3e(-N(a/q+\beta)) \; d\beta.

Then we apply Lemma 14, which gives

\displaystyle  \begin{aligned} 	f(a/q+\beta)^3 	&= \left(\frac{\mu(q)}{\phi(q)}\sum_{n \le N}e(\beta n) \right)^3 \\ 	&+\left(\frac{\mu(q)}{\phi(q)}\sum_{n \le N}e(\beta n)\right)^2 		O\left((1+\|\beta\|N)\sqrt{qN} \log^2 qN\right) \\ 	&+\left(\frac{\mu(q)}{\phi(q)}\sum_{n \le N}e(\beta n)\right) 		O\left((1+\|\beta\|N)\sqrt{qN} \log^2 qN\right)^2 \\ 	&+O\left((1+\|\beta\|N)\sqrt{qN} \log^2 qN\right)^3. \end{aligned}

Now, we can do casework on the side of {N^{-.9}} that {\|\beta\|} lies on.

  • If {\|\beta\| \gg N^{-.9}}, we have {\sum_{n \le N}e(\beta n) \ll \frac{2}{|e(\beta)-1|} 	\ll \frac{1}{\|\beta\|} \ll N^{.9}}, and {(1+\|\beta\|N)\sqrt{qN} \log^2 qN \ll N^{5/6+\varepsilon}}, because certainly we have {\|\beta\|<1/M=N^{-2/3}}.
  • If on the other hand {\|\beta\|\ll N^{-.9}}, we have {\sum_{n \le N}e(\beta n) \ll N} obviously, and {O(1+\|\beta\|N)\sqrt{qN} \log^2 qN) \ll N^{3/5+\varepsilon}}.

As such, we obtain

\displaystyle  f(a/q+\beta)^3 \ll \left( \frac{\mu(q)}{\phi(q)}\sum_{n \le N}e(\beta n) \right)^3 	+ O\left(N^{79/30+\varepsilon}\right)

in either case. Thus, we can write

\displaystyle  \begin{aligned} 	&\qquad \int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha \\ 	&= \sum_{q \le (\log N)^{10}} \sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} 	\int_{-1/qM}^{1/qM} f(a/q+\beta)^3e(-N(a/q+\beta)) \; d\beta \\ 	&= \sum_{q \le (\log N)^{10}} \sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} 		\int_{-1/qM}^{1/qM}\left[\left(\frac{\mu(q)}{\phi(q)}\sum_{n \le N}e(\beta n)\right)^3 		+ O\left(N^{79/30+\varepsilon}\right)\right]e(-N(a/q+\beta)) \; d\beta \\ 	&=\sum_{q \le (\log N)^{10}} \frac{\mu(q)}{\phi(q)^3} S_q 		\left(\sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} e(-N(a/q))\right) 		\left( \int_{-1/qM}^{1/qM}\left(\sum_{n \le N}e(\beta n)\right)^3e(-N\beta) 		\; d\beta \right ) \\ 	&\qquad +O\left(N^{59/30+\varepsilon}\right). \end{aligned}

just using {M \le N^{2/3}}. Now, we use

\displaystyle  \sum_{n \le N}e(\beta n) = \frac{1-e(\beta N)}{1-e(\beta)} 	\ll \frac{1}{\|\beta\|}.

This enables us to bound the expression

\displaystyle  \int_{1/qM}^{1-1/qM}\left (\sum_{n \le N}e(\beta n)\right) ^ 3 e(-N\beta)d\beta 	\ll \int_{1/qM}^{1-1/qM}\|\beta\|^{-3} d\beta = 2\int_{1/qM}^{1/2}\beta^{-3} d\beta 	\ll q^2M^2.

But the integral over the entire interval is

\displaystyle  \begin{aligned} 	\int_{0}^{1}\left(\sum_{n \le N}e(\beta n) \right)^3 e(-N\beta)d\beta 	&= \int_{0}^{1} \sum_{a,b,c \le N} e((a+b+c-N)\beta) \\ 	&\ll \sum_{a,b,c \le N} \mathbf 1(a+b+c=N) \\ 	&= \binom{N-1}{2}. \end{aligned}

Considering the difference of the two integrals gives

\displaystyle  \int_{-1/qM}^{1/qM}\left(\sum_{n \le N}e(\beta n) \right)^3 	e(-N\beta) \; d\beta - \frac{N^2}{2} \ll q^2 M^2 + N 	\ll (\log N)^c N^{4/3},

for some absolute constant {c}.

For brevity, let

\displaystyle  S_q = \sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} e(-N(a/q)).


\displaystyle  \begin{aligned} 	\int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha &= \sum_{q \le (\log N)^{10}} \frac{\mu(q)}{\phi(q)^3}S_q 		\left( \int_{-1/qM}^{1/qM}\left(\sum_{n \le N}e(\beta n)\right)^3e(-N\beta) 		\; d\beta \right ) \\ 	&\qquad +O\left(N^{59/30+\varepsilon}\right) \\ &= \frac{N^2}{2}\sum_{q \le (\log N)^{10}} 	\frac{\mu(q)}{\phi(q)^3}S_q + O((\log N)^{10+c} N^{4/3}) 		+ O(N^{59/30+\varepsilon}) \\ &= \frac{N^2}{2}\sum_{q \le (\log N)^{10}} \frac{\mu(q)}{\phi(q)^3} 		+ O(N^{59/30+\varepsilon}). \end{aligned}


The inner sum is bounded by {\phi(q)}. So,

\displaystyle \left\lvert \sum_{q>(\log N)^{10}} 	\frac{\mu(q)}{\phi(q)^3} S_q \right\rvert 	 \le \sum_{q>(\log N)^{10}} \frac{1}{\phi(q)^2},

which converges since {\phi(q)^2 \gg q^c} for some {c > 1}. So

\displaystyle  \int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha 	= \frac{N^2}{2}\sum_{q = 1}^\infty \frac{\mu(q)}{\phi(q)^3}S_q 	+ O(N^{59/30+\varepsilon}).

Now, since {\mu(q)}, {\phi(q)}, and {\sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} e(-N(a/q))} are multiplicative functions of {q}, and {\mu(q)=0} unless {q} is squarefree,

\displaystyle  \begin{aligned} \sum_{q = 1}^\infty \frac{\mu(q)}{\phi(q)^3} S_q 	&= \prod_p \left(1+\frac{\mu(p)}{\phi(p)^3}S_p \right) \\ 	&= \prod_p \left(1-\frac{1}{(p-1)^3} 		\sum_{a=1}^{p-1} e(-N(a/p))\right) \\ 	&= \prod_p \left(1-\frac{1}{(p-1)^3}\sum_{a=1}^{p-1} 		(p\cdot \mathbf 1(p|N) - 1)\right) \\ 	&= \prod_{p|N}\left(1-\frac{1}{(p-1)^2}\right) 		\prod_{p \nmid N}\left(1+\frac{1}{(p-1)^3}\right) \\ 	&= \mathfrak G(N). \end{aligned}


\displaystyle \int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha = \frac{N^2}{2}\mathfrak{G}(N) + O(N^{59/30+\varepsilon}).

When {N} is odd,

\displaystyle  \mathfrak{G}(N) = \prod_{p|N}\left(1-\frac{1}{(p-1)^2}\right)\prod_{p \nmid N}\left(1+\frac{1}{(p-1)^3}\right)\geq \prod_{m\geq 3}\left(\frac{m-2}{m-1}\frac{m}{m-1}\right)=\frac{1}{2},

so that we have

\displaystyle \int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha \asymp \frac{N^2}{2}\mathfrak{G}(N),

as desired.

6. Completing the proof

Because the integral over the minor arc is {o(N^2)}, it follows that

\displaystyle \sum_{a+b+c=N} \Lambda(a)\Lambda(b)\Lambda(c) = \int_{0}^{1} f(\alpha)^3 e(-N\alpha) d \alpha \asymp \frac{N^2}{2}\mathfrak{G}(N) \gg N^2.

Consider the set {S_N} of integers {p^k\leq N} with {k>1}. We must have {p \le N^{\frac{1}{2}}}, and for each such {p} there are at most {O(\log N)} possible values of {k}. As such, {|S_N| \ll\pi(N^{1/2}) \log N\ll N^{1/2}}.


\displaystyle \sum_{\substack{a+b+c=N \\ a\in S_N}} \Lambda(a)\Lambda(b)\Lambda(c) \ll (\log N)^3 |S|N \ll\log(N)^3 N^{3/2},

and similarly for {b\in S_N} and {c\in S_N}. Notice that summing over {a\in S_N} is equivalent to summing over composite {a}, so

\displaystyle  \sum_{p_1+p_2+p_3=N} \Lambda(p_1)\Lambda(p_2)\Lambda(p_3) 	=\sum_{a+b+c=N} \Lambda(a)\Lambda(b)\Lambda(c) + O(\log(N)^3 N^{3/2}) 	\gg N^2,

where the sum is over primes {p_i}. This finishes the proof.


Apparently even people on Quora want to know why I transferred from Harvard to MIT. Since I’ve been asked this question way too many times, I guess I should give an answer, once and for all.

There were plenty of reasons (and anti-reasons). I should say some anti-reasons first to give due credit — the Harvard math department is fantastic, and Harvard gives you significantly more freedom than MIT to take whatever you want. These were the main reasons why transferring was a difficult decision, and in fact I’m only ~70% sure I might the right choice.

Ultimately, the main reason I transferred was due to the housing.

At MIT, you basically get to choose where you live. All the dorms, and even floors within dorms, are different: living on 3rd West versus living on 5th East might as well be going to different colleges. Even if for some bizarre reason you hate 90% of the students at MIT you can still have a fantastic social experience if you’re in a dorm you like.

This is not true at Harvard, which shoves you in dorms more or less at random. Specifically,

  • In freshman year, you are assigned a random dorm, and eat in a segregated dining hall (Annenberg) exclusively with freshman. All students are placed on a mandatory unlimited meal plan, I guess to discourage them from eating out.
  • After freshman year, you get a random House, and eat in a dining hall built into the House. There are restrictions that make it deliberately difficult to eat at other Houses.

The result of this random mixing is that (a) you only know people in your own year, and (b) zero dorm culture. Lounges are deserted, doors are shut, and people are unfindable — in fact I still don’t know the names of the students who lived next door to me. This a bigger deal than people give it credit for: students are busy and campus is large, so you don’t really see someone unless you share a class, live near them, or date them. For example, I rarely talked to James Tao, even though we’d known each other for three years beforehand and had plenty in common.

Put more harshly: “Harvard’s dominant typical social tone is superficial, inane, and too frequently alcohol-drenched to be interesting. It actively thwarts any attempts to escape this atmosphere, by assigning groups of students to dorms randomly — thus guaranteeing all students a more-or-less uniformly superficial, inane and alcohol-drenched experience.”

The problems I mentioned were worse for me specifically since I took exclusively upper-level math courses. My classmates were all upperclassmen who all already knew each other and ate/lived elsewhere. For my own meals, the typical Annenberg conversation was either classes or gossip, so I had little to say to the other freshman (if I talked about my classes I sounded like a showoff). I was often sitting alone in my room, which was great for learning category theory but not so much for my mood. I ended up moving in to an MIT dorm for a good chunk of the school year, where it was much easier to find people I could relate well to (because they all lived in one place).

At Harvard I was constantly isolated and bored. I got sick of it and left.