A Sketchy Overview of Green-Tao

These are the notes of my last lecture in the 18.099 discrete analysis seminar. It is a very high-level overview of the Green-Tao theorem. It is a subset of this paper.

1. Synopsis

This post as in overview of the proof of:

Theorem 1 (Green-Tao)

The prime numbers contain arbitrarily long arithmetic progressions.

Here, Szemerédi’s theorem isn’t strong enough, because the primes have density approaching zero. Instead, one can instead try to prove the following “relativity” result.

Theorem (Relative Szemerédi)

Let {S} be a sparse “pseudorandom” set of integers. Then subsets of {A} with positive density in {S} have arbitrarily long arithmetic progressions.

In order to do this, we have to accomplish the following.

  • Make precise the notion of “pseudorandom”.
  • Prove the Relative Szemerédi theorem, and then
  • Exhibit a “pseudorandom” set {S} which subsumes the prime numbers.

This post will use the graph-theoretic approach to Szemerédi as in the exposition of David Conlon, Jacob Fox, and Yufei Zhao. In order to motivate the notion of pseudorandom, we return to the graph-theoretic approach of Roth’s theorem, i.e. the case {k=3} of Szemerédi’s theorem.

2. Defining the linear forms condition

2.1. Review of Roth theorem

Roth’s theorem can be phrased in two ways. The first is the “set-theoretic” formulation:

Theorem 2 (Roth, set version)

If {A \subseteq \mathbb Z/N} is 3-AP-free, then {|A| = o(N)}.

The second is a “weighted” version

Theorem 3 (Roth, weighted version)

Fix {\delta > 0}. Let {f : \mathbb Z/N \rightarrow [0,1]} with {\mathbf E f \ge \delta}. Then

\displaystyle \Lambda_3(f,f,f) \ge \Omega_\delta(1).

We sketch the idea of a graph-theoretic proof of the first theorem. We construct a tripartite graph {G_A} on vertices {X \sqcup Y \sqcup Z}, where {X = Y = Z = \mathbb Z/N}. Then one creates the edges

  • {(x,y)} if {2x+ y \in A},
  • {(x,z)} if {x-z \in A}, and
  • {(y,z)} if {-y-2z \in A}.

This construction is selected so that arithmetic progressions in {A} correspond to triangles in the graph {G_A}. As a result, if {A} has no 3-AP’s (except trivial ones, where {x+y+z=0}), the graph {G_A} has exactly one triangle for every edge. Then, we can use the theorem of Ruzsa-Szemerédi, which states that this graph {G_A} has {o(n^2)} edges.

2.2. The measure {\nu}

Now for the generalized version, we start with the second version of Roth’s theorem. Instead of a set {S}, we consider a function

\displaystyle \nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}

which we call a majorizing measure. Since we are now dealing with {A} of low density, we normalize {\nu} so that

\displaystyle \mathbf E[\nu] = 1 + o(1).

Our goal is to now show a result of the form:

Theorem (Relative Roth, informally, weighted version)

If {0 \le f \le \nu}, {\mathbf E f \ge \delta}, and {\nu} satisfies a “pseudorandom” condition, then {\Lambda_3(f,f,f) \ge \Omega_{\delta}(1)}.

The prototypical example of course is that if {A \subset S \subset \mathbb Z_N}, then we let {\nu(x) = \frac{N}{|S|} 1_S(x)}.

2.3. Pseudorandomness for {k=3}

So, how can we put the pseudorandom condition? Initially, consider {G_S} the tripartite graph defined earlier, and let {p = |S| / N}; since {S} is sparse we expect {p} small. The main idea that turns out to be correct is: The number of embeddings of {K_{2,2,2}} in {S} is “as expected”, namely {(1+o(1)) p^{12} N^6}. Here {K_{2,2,2}} is actually the {2}-blow-up of a triangle. This condition thus gives us control over the distribution of triangles in the sparse graph {G_S}: knowing that we have approximately the correct count for {K_{2,2,2}} is enough to control distribution of triangles.

For technical reasons, in fact we want this to be true not only for {K_{2,2,2}} but all of its subgraphs {H}.

Now, let’s move on to the weighted version. Let’s consider a tripartite graph {G}, which we can think of as a collection of three functions

\displaystyle \begin{aligned} \mu_{-z} &: X \times Y \rightarrow \mathbb R \\ \mu_{-y} &: X \times Z \rightarrow \mathbb R \\ \mu_{-x} &: Y \times Z \rightarrow \mathbb R. \end{aligned}

We think of {\mu} as normalized so that {\mathbf E[\mu_{-x}] = \mathbf E[\mu_{-y}] = \mathbf E[\mu_{-z}] = 1}. Then we can define

Definition 4

A weighted tripartite graph {\mu = (\mu_{-x}, \mu_{-y}, \mu_{-z})} satisfies the {3}-linear forms condition if

\displaystyle \begin{aligned} \mathbf E_{x^0,x^1,y^0,y^1,z^0,z^1} &\Big[ \mu_{-x}(y^0,z^0) \mu_{-x}(y^0,z^1) \mu_{-x}(y^1,z^0) \mu_{-x}(y^1,z^1) \\ & \mu_{-y}(x^0,z^0) \mu_{-y}(x^0,z^1) \mu_{-y}(x^1,z^0) \mu_{-y}(x^1,z^1) \\ & \mu_{-z}(x^0,y^0) \mu_{-z}(x^0,y^1) \mu_{-z}(x^1,y^0) \mu_{-z}(x^1,y^1) \Big] \\ &= 1 + o(1) \end{aligned}

and similarly if any of the twelve factors are deleted.

Then the pseudorandomness condition is according to the graph we defined above:

Definition 5

A function {\nu : \mathbb Z / N \rightarrow \mathbb Z} is satisfies the {3}-linear forms condition if {\mathbf E[\nu] = 1 + o(1)}, and the tripartite graph {\mu = (\mu_{-x}, \mu_{-y}, \mu_{-z})} defined by

\displaystyle \begin{aligned} \mu_{-z} &= \nu(2x+y) \\ \mu_{-y} &= \nu(x-z) \\ \mu_{-x} &= \nu(-y-2z) \end{aligned}

satisfies the {3}-linear forms condition.

Finally, the relative version of Roth’s theorem which we seek is:

Theorem 6 (Relative Roth)

Suppose {\nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}} satisfies the {3}-linear forms condition. Then for any {f : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}} bounded above by {\nu} and satisfying {\mathbf E[f] \ge \delta > 0}, we have

\displaystyle \Lambda_3(f,f,f) \ge \Omega_{\delta}(1).

2.4. Relative Szemerédi

We of course have:

Theorem 7 (Szemerédi)

Suppose {k \ge 3}, and {f : \mathbb Z/n \rightarrow [0,1]} with {\mathbf E[f] \ge \delta}. Then

\displaystyle \Lambda_k(f, \dots, f) \ge \Omega_{\delta}(1).

For {k > 3}, rather than considering weighted tripartite graphs, we consider a {(k-1)}-uniform {k}-partite hypergraph. For example, given {\nu} with {\mathbf E[\nu] = 1 + o(1)} and {k=4}, we use the construction

\displaystyle \begin{aligned} \mu_{-z}(w,x,y) &= \nu(3w+2x+y) \\ \mu_{-y}(w,x,z) &= \nu(2w+x-z) \\ \mu_{-x}(w,y,z) &= \nu(w-y-2z) \\ \mu_{-w}(x,y,z) &= \nu(-x-2y-3z). \end{aligned}

Thus 4-AP’s correspond to the simplex {K_4^{(3)}} (i.e. a tetrahedron). We then consider the two-blow-up of the simplex, and require the same uniformity on subgraphs of {H}.

Here is the compiled version:

Definition 8

A {(k-1)}-uniform {k}-partite weighted hypergraph {\mu = (\mu_{-i})_{i=1}^k} satisfies the {k}-linear forms condition if

\displaystyle \mathbf E_{x_1^0, x_1^1, \dots, x_k^0, x_k^1} \left[ \prod_{j=1}^k \prod_{\omega \in \{0,1\}^{[k] \setminus \{j\}}} \mu_{-j}\left( x_1^{\omega_1}, \dots, x_{j-1}^{\omega_{j-1}}, x_{j+1}^{\omega_{j+1}}, \dots, x_k^{\omega_k} \right)^{n_{j,\omega}} \right] = 1 + o(1)

for all exponents {n_{j,w} \in \{0,1\}}.

Definition 9

A function {\nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}} satisfies the {k}-linear forms condition if {\mathbf E[\nu] = 1 + o(1)}, and

\displaystyle \mathbf E_{x_1^0, x_1^1, \dots, x_k^0, x_k^1} \left[ \prod_{j=1}^k \prod_{\omega \in \{0,1\}^{[k] \setminus \{j\}}} \nu\left( \sum_{i=1}^k (j-i)x_i^{(\omega_i)} \right)^{n_{j,\omega}} \right] = 1 + o(1)

for all exponents {n_{j,w} \in \{0,1\}}. This is just the previous condition with the natural {\mu} induced by {\nu}.

The natural generalization of relative Szemerédi is then:

Theorem 10 (Relative Szemerédi)

Suppose {k \ge 3}, and {\nu : \mathbb Z/n \rightarrow \mathbb R_{\ge 0}} satisfies the {k}-linear forms condition. Let {f : \mathbb Z/N to \mathbb R_{\ge 0}} with {\mathbf E[f] \ge \delta}, {f \le \nu}. Then

\displaystyle \Lambda_k(f, \dots, f) \ge \Omega_{\delta}(1).

3. Outline of proof of Relative Szemerédi

The proof of Relative Szeremédi uses two key facts. First, one replaces {f} with a bounded {\widetilde f} which is near it:

Theorem 11 (Dense model)

Let {\varepsilon > 0}. There exists {\varepsilon' > 0} such that if:

  • {\nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}} satisfies {\left\lVert \nu-1 \right\rVert^{\square}_r \le \varepsilon'}, and
  • {f : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}}, {f \le \nu}

then there exists a function {\widetilde f : \mathbb Z/N \rightarrow [0,1]} such that {\left\lVert f - \widetilde f \right\rVert^{\square}_r \le \varepsilon}.

Here we have a new norm, called the cut norm, defined by

\displaystyle \left\lVert f \right\rVert^{\square}_r = \sup_{A_i \subseteq (\mathbb Z/N)^{r-1}} \left\lvert \mathbf E_{x_1, \dots, x_r} f(x_1 + \dots + x_r) 1_{A_1}(x_{-1}) \dots 1_{A_r}(x_{-r}) \right\rvert.

This is actually an extension of the cut norm defined on a {r}-uniform {r}-partite hypergraph (not {(r-1)}-uniform like before!): if {g : X_1 \times \dots \times X_r \rightarrow \mathbb R} is such a graph, we let

\displaystyle \left\lVert g \right\rVert^{\square}_{r,r} = \sup_{A_i \subseteq X_{-i}} \left\lvert g(x_1, \dots, x_r) 1_{A_1}(x_{-1}) \dots 1_{A_r}(x_{-r}) \right\rvert.

Taking {g(x_1, \dots, x_r) = f(x_1 + \dots + x_r)}, {X_1 = \dots = X_r = \mathbb Z/N} gives the analogy.

For the second theorem, we define the norm

\displaystyle \left\lVert g \right\rVert^{\square}_{k-1,k} = \max_{i=1,\dots,k} \left( \left\lVert g_{-i} \right\rVert^{\square}_{k-1, k-1} \right).

Theorem 12 (Relative simplex counting lemma)

Let {\mu}, {g}, {\widetilde g} be weighted {(k-1)}-uniform {k}-partite weighted hypergraphs on {X_1 \cup \dots \cup X_k}. Assume that {\mu} satisfies the {k}-linear forms condition, and {0 \le g_{-i} \le \mu_{-i}} for all {i}, {0 \le \widetilde g \le 1}. If {\left\lVert g-\widetilde g \right\rVert^{\square}_{k-1,k} = o(1)} then

\displaystyle \mathbf E_{x_1, \dots, x_k} \left[ g(x_{-1}) \dots g(x_{-k}) - \widetilde g(x_{-1}) \dots \widetilde g(x_{-k}) \right] = o(1).

One then combines these two results to prove Szemerédi, as follows. Start with {f} and {\nu} in the theorem. The {k}-linear forms condition turns out to imply {\left\lVert \nu-1 \right\rVert^{\square}_{k-1} = o(1)}. So we can find a nearby {\widetilde f} by the dense model theorem. Then, we induce {\nu}, {g}, {\widetilde g} from {\mu}, {f}, {\widetilde f} respectively. The counting lemma then reduce the bounding of {\Lambda_k(f, \dots, f)} to the bounding of {\Lambda_k(\widetilde f, \dots, \widetilde f)}, which is {\Omega_\delta(1)} by the usual Szemerédi theorem.

4. Arithmetic progressions in primes

We now sketch how to obtain Green-Tao from Relative Szemerédi. As expected, we need to us the von Mangoldt function {\Lambda}.

Unfortunately, {\Lambda} is biased (e.g. “all decent primes are odd”). To get around this, we let {w = w(N)} tend to infinity slowly with {N}, and define

\displaystyle W = \prod_{p \le w} p.

In the {W}-trick we consider only primes {1 \pmod W}. The modified von Mangoldt function then is defined by

\displaystyle \widetilde \Lambda(n) = \begin{cases} \frac{\varphi(W)}{W} \log (Wn+1) & Wn+1 \text{ prime} \\ 0 & \text{else}. \end{cases}

In accordance with Dirichlet, we have {\sum_{n \le N} \widetilde \Lambda(n) = N + o(N)}.

So, we need to show now that

Proposition 13

Fix {k \ge 3}. We can find {\delta = \delta(k) > 0} such that for {N \gg 1} prime, we can find {\nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}} which satisfies the {k}-linear forms condition as well as

\displaystyle \nu(n) \ge \delta \widetilde \Lambda(n)

for {N/2 \le n < N}.

In that case, we can let

\displaystyle f(n) = \begin{cases} \delta \widetilde\Lambda(n) & N/2 \le n < N \\ 0 & \text{else}. \end{cases}

Then {0 \le f \le \nu}. The presence of {N/2 \le n < N} allows us to avoid “wrap-around issues” that arise from using {\mathbb Z/N} instead of {\mathbb Z}. Relative Szemerédi then yields the result.

For completeness, we state the construction. Let {\chi : \mathbb R \rightarrow [0,1]} be supported on {[-1,1]} with {\chi(0) = 1}, and define a normalizing constant {c_\chi = \int_0^\infty \left\lvert \chi'(x) \right\rvert^2 \; dx}. Inspired by {\Lambda(n) = \sum_{d \mid n} \mu(d) \log(n/d)}, we define a truncated {\Lambda} by

\displaystyle \Lambda_{\chi, R}(n) = \log R \sum_{d \mid n} \mu(d) \chi\left( \frac{\log d}{\log R} \right).

Let {k \ge 3}, {R = N^{k^{-1} 2^{-k-3}}}. Now, we define {\nu} by

\displaystyle \nu(n) = \begin{cases} \dfrac{\varphi(W)}{W} \dfrac{\Lambda_{\chi,R}(Wn+1)^2}{c_\chi \log R} & N/2 \le n < N \\ 0 & \text{else}. \end{cases}

This turns out to work, provided {w} grows sufficiently slowly in {N}.

Vinogradov’s Three-Prime Theorem (with Sammy Luo and Ryan Alweiss)

This was my final paper for 18.099, seminar in discrete analysis, jointly with Sammy Luo and Ryan Alweiss.

We prove that every sufficiently large odd integer can be written as the sum of three primes, conditioned on a strong form of the prime number theorem.

1. Introduction

In this paper, we prove the following result:

Theorem 1 (Vinogradov)

Every sufficiently large odd integer {N} is the sum of three prime numbers.

In fact, the following result is also true, called the “weak Goldbach conjecture”.

Theorem 2 (Weak Goldbach conjecture)

Every odd integer {N \ge 7} is the sum of three prime numbers.

The proof of Vinogradov’s theorem becomes significantly simpler if one assumes the generalized Riemann hypothesis; this allows one to use a strong form of the prime number theorem (Theorem 9). This conditional proof was given by Hardy and Littlewood in the 1923’s. In 1997, Deshouillers, Effinger, te Riele and Zinoviev showed that the generalized Riemann hypothesis in fact also implies the weak Goldbach conjecture by improving the bound to {10^{20}} and then exhausting the remaining cases via a computer search.

As for unconditional proofs, Vinogradov was able to eliminate the dependency on the generalized Riemann hypothesis in 1937, which is why the Theorem 1 bears his name. However, Vinogradov’s bound used the ineffective Siegel-Walfisz theorem; his student K. Borozdin showed that {3^{3^{15}}} is large enough. Over the years the bound was improved, until recently in 2013 when Harald Helfgott claimed the first unconditional proof of Theorem 2, see here.

In this exposition we follow Hardy and Littlewood’s approach, i.e. we prove Theorem 1 assuming the generalized Riemann hypothesis, following the exposition of Rhee. An exposition of the unconditional proof by Vinogradov is given by Rouse.

2. Synopsis

We are going to prove that

\displaystyle  	\sum_{a+b+c = N} \Lambda(a) \Lambda(b) \Lambda(c) \asymp \frac12 N^2 \mathfrak G(N) 	 \ \ \ \ \ (1)

where

\displaystyle  \mathfrak G(N) 	\overset{\text{def}}{=} \prod_{p \mid N} \left( 1 - \frac{1}{(p-1)^2} \right) 	\prod_{p \nmid N} \left( 1 + \frac{1}{(p-1)^3} \right)

and {\Lambda} is the von Mangoldt function defined as usual. Then so long as {2 \nmid N}, the quantity {\mathfrak G(N)} will be bounded away from zero; thus (1) will imply that in fact there are many ways to write {N} as the sum of three distinct prime numbers.

The sum (1) is estimated using Fourier analysis. Let us define the following.

Definition 3

Let {\mathbb T = \mathbb R/\mathbb Z} denote the circle group, and let {e : \mathbb T \rightarrow \mathbb C} be the exponential function {\theta \mapsto \exp(2\pi i \theta)}. For {\alpha\in\mathbb T}, {\|\alpha\|} denotes the minimal distance from {\alpha} to an integer.

Note that {|e(\theta)-1|=\Theta(\|\theta\|)}.

Definition 4

For {\alpha \in \mathbb T} and {x > 0} we define

\displaystyle  S(x, \alpha) = \sum_{n \le x} \Lambda(n) e(n\alpha).

Then we can rewrite (1) using {S} as a “Fourier coefficient”:

Proposition 5

We have

\displaystyle  		\sum_{a+b+c = N} \Lambda(a) \Lambda(b) \Lambda(c) 		= \int_{\alpha \in \mathbb T} S(N, \alpha)^3 e(-N\alpha) \; d\alpha. 		 	\ \ \ \ \ (2)

Proof: We have

\displaystyle S(N,\alpha)^3=\sum_{a,b,c\leq N}\Lambda(a)\Lambda(b)\Lambda(c)e((a+b+c)\alpha),

so

\displaystyle  \begin{aligned} \int_{\alpha \in \mathbb T} S(N, \alpha)^3 e(-N\alpha) \; d\alpha &= \int_{\alpha \in \mathbb T} \sum_{a,b,c\leq N}\Lambda(a)\Lambda(b)\Lambda(c)e((a+b+c)\alpha) e(-N\alpha) \; d\alpha \\ &= \sum_{a,b,c\leq N}\Lambda(a)\Lambda(b)\Lambda(c)\int_{\alpha \in \mathbb T}e((a+b+c-N)\alpha) \; d\alpha \\ &= \sum_{a,b,c\leq N}\Lambda(a)\Lambda(b)\Lambda(c)I(a+b+c=N) \\ &= \sum_{a+b+c=N}\Lambda(a)\Lambda(b)\Lambda(c), \end{aligned}

as claimed. \Box

In order to estimate the integral in Proposition 5, we divide {\mathbb T} into the so-called “major” and “minor” arcs. Roughly,

  • The “major arcs” are subintervals of {\mathbb T} centered at a rational number with small denominator.
  • The “minor arcs” are the remaining intervals.

These will be made more precise later. This general method is called the Hardy-Littlewood circle method, because of the integral over the circle group {\mathbb T}.

The rest of the paper is structured as follows. In Section 3, we define the Dirichlet character and other number-theoretic objects, and state some estimates for the partial sums of these objects conditioned on the Riemann hypothesis. These bounds are then used in Section 4 to provide corresponding estimates on {S(x, \alpha)}. In Section 5 we then define the major and minor arcs rigorously and use the previous estimates to given an upper bound for the integral over both areas. Finally, we complete the proof in Section 6.

3. Prime number theorem type bounds

In this section, we collect the necessary number-theoretic results that we will need. It is in this section only that we will require the generalized Riemann hypothesis.

As a reminder, the notation {f(x)\ll g(x)}, where {f} is a complex function and {g} a nonnegative real one, means {f(x)=O(g(x))}, a statement about the magnitude of {f}. Likewise, {f(x)=g(x)+O(h(x))} simply means that for some {C}, {|f(x)-g(x)|\leq C|h(x)|} for all sufficiently large {x}.

3.1. Dirichlet characters

In what follows, {q} denotes a positive integer.

Definition 6

A Dirichlet character modulo {q} {\chi} is a homomorphism {\chi : (\mathbb Z/q)^\times \rightarrow \mathbb C^\times}. It is said to be trivial if {\chi = 1}; we denote this character by {\chi_0}.

By slight abuse of notation, we will also consider {\chi} as a function {\mathbb Z \rightarrow \mathbb C^\ast} by setting {\chi(n) = \chi(n \pmod q)} for {\gcd(n,q) = 1} and {\chi(n) = 0} for {\gcd(n,q) > 1}.

Remark 7

The Dirichlet characters form a multiplicative group of order {\phi(q)} under multiplication, with inverse given by complex conjugation. Note that {\chi(m)} is a primitive {\phi(q)}th root of unity for any {m \in (\mathbb Z/q)^\times}, thus {\chi} takes values in the unit circle.

Moreover, the Dirichlet characters satisfy an orthogonality relation

Experts may recognize that the Dirichlet characters are just the elements of the Pontryagin dual of {(\mathbb Z/q)^\times}. In particular, they satisfy an orthogonality relationship

\displaystyle  	\frac{1}{\phi(q)} 	\sum_{\chi \text{ mod } q} \chi(n) \overline{\chi(a)} 	= \begin{cases} 		1 & n = a \pmod q \\ 		0 & \text{otherwise} 	\end{cases} 	 \ \ \ \ \ (3)

and thus form an orthonormal basis for functions {(\mathbb Z/q)^\times \rightarrow \mathbb C}.

3.2. Prime number theorem for arithmetic progressions

Definition 8

The generalized Chebyshev function is defined by

\displaystyle  \psi(x, \chi) = \sum_{n \le x} \Lambda(n) \chi(n).

The Chebyshev function is studied extensively in analytic number theory, as it is the most convenient way to phrase the major results of analytic number theory. For example, the prime number theorem is equivalent to the assertion that

\displaystyle  \psi(x, \chi_0) = \sum_{n \le x} \Lambda(n) \asymp x

where {q = 1} (thus {\chi_0} is the constant function {1}). Similarly, Dirichlet’s theorem actually asserts that any {q \ge 1},

\displaystyle  	\psi(x, \chi) 	= \begin{cases} 		x + o_q(x) & \chi = \chi_0 \text{ trivial} \\ 		o_q(x) & \chi \neq \chi_0 \text{ nontrivial}. 	\end{cases}

However, the error term in these estimates is quite poor (more than {x^{1-\varepsilon}} for every {\varepsilon}). However, by assuming the Riemann Hypothesis for a certain “{L}-function” attached to {\chi}, we can improve the error terms substantially.

Theorem 9 (Prime number theorem for arithmetic progressions)

Let {\chi} be a Dirichlet character modulo {q}, and assume the Riemann hypothesis for the {L}-function attached to {\chi}.

  1. If {\chi} is nontrivial, then

    \displaystyle  \psi(x, \chi) \ll \sqrt{x} (\log qx)^2.

  2. If {\chi = \chi_0} is trivial, then

    \displaystyle  \psi(x, \chi_0) = x + O\left( \sqrt x (\log x)^2 + \log q \log x \right).

Theorem 9 is the strong estimate that we will require when putting good estimates on {S(x, \alpha)}, and is the only place in which the generalized Riemann Hypothesis is actually required.

3.3. Gauss sums

Definition 10

For {\chi} a Dirichlet character modulo {q}, the Gauss sum {\tau(\chi)} is defined by

\displaystyle \tau(\chi)=\sum_{a=0}^{q-1}\chi(a)e(a/q).

We will need the following fact about Gauss sums.

Lemma 11

Consider Dirichlet characters modulo {q}. Then:

  1. We have {\tau(\chi_0) = \mu(q)}.
  2. For any {\chi} modulo {q}, {\left\lvert \tau(\chi) \right\rvert \le \sqrt q}.

3.4. Dirichlet approximation

We finally require Dirichlet approximation theorem in the following form.

Theorem 12 (Dirichlet approximation)

Let {\alpha \in \mathbb R} be arbitrary, and {M} a fixed integer. Then there exists integers {a} and {q = q(\alpha)}, with {1 \le q \le M} and {\gcd(a,q) = 1}, satisfying

\displaystyle  \left\lvert \alpha - \frac aq \right\rvert \le \frac{1}{qM}.

4. Bounds on {S(x, \alpha)}

In this section, we use our number-theoretic results to bound {S(x,\alpha)}.

First, we provide a bound for {S(x,\alpha)} if {\alpha} is a rational number with “small” denominator {q}.

Lemma 13

Let {\gcd(a,q) = 1}. Assuming Theorem 9, we have

\displaystyle  S(x, a/q) 		= \frac{\mu(q)}{\phi(q)} x + O\left( \sqrt{qx} (\log qx)^2 \right)

where {\mu} denotes the Möbius function.

Proof: Write the sum as

\displaystyle  S(x, a/q) = \sum_{n \le x} \Lambda(n) e(na/q).

First we claim that the terms {\gcd(n,q) > 1} (and {\Lambda(n) \neq 0}) contribute a negligibly small {\ll \log q \log x}. To see this, note that

  • The number {q} has {\ll \log q} distinct prime factors, and
  • If {p^e \mid q}, then {\Lambda(p) + \dots + \Lambda(p^e) 			= e\log p = \log(p^e) < \log x}.

So consider only terms with {\gcd(n,q) = 1}. To bound the sum, notice that

\displaystyle  \begin{aligned} 		e(n \cdot a/q) &= \sum_{b \text{ mod } q} e(b/q) \cdot \mathbf 1(b \equiv an) \\ 		&= \sum_{b \text{ mod } q} e(b/q) \left( \frac{1}{\phi(q)} 			\sum_{\chi \text{ mod } q} \chi(b) \overline{\chi(an)} \right) 	\end{aligned}

by the orthogonality relations. Now we swap the order of summation to obtain a Gauss sum:

\displaystyle  \begin{aligned} 		e(n \cdot a/q) &= \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \overline{\chi(an)} 			\left( \sum_{b \text{ mod } q} \chi(b) e(b/q) \right) \\ 		&= \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \overline{\chi(an)} \tau(\chi). 	\end{aligned}

Thus, we swap the order of summation to obtain that

\displaystyle  \begin{aligned} 		S(x, \alpha) &= \sum_{\substack{n \le x \\ \gcd(n,q) = 1}} 			\Lambda(n) e(n \cdot a/q) \\ 		&= \frac{1}{\phi(q)} \sum_{\substack{n \le x \\ \gcd(n,q) = 1}} 			\sum_{\chi \text{ mod } q} \Lambda(n) \overline{\chi(an)} \tau(\chi) \\ 		&= \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \tau(\chi) 			\sum_{\substack{n \le x \\ \gcd(n,q) = 1}} \Lambda(n) \overline{\chi(an)} \\ 		&= \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \overline{\chi(a)} \tau(\chi) 			\sum_{\substack{n \le x \\ \gcd(n,q) = 1}} \Lambda(n)\overline{\chi(n)} \\ 		&= \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \overline{\chi(a)} 			\tau(\chi) \psi(x, \overline\chi) \\ 		&= \frac{1}{\phi(q)} \left( \tau(\chi_0) \psi(x, \chi_0) 			+ \sum_{1 \neq \chi \text{ mod } q} \overline{\chi(a)} \tau(\chi) 			\psi(x, \overline\chi) \right). 	\end{aligned}

Now applying both parts of Lemma 11 in conjunction with Theorem 9 gives

\displaystyle  \begin{aligned} 		S(x,\alpha) 		&= \frac{\mu(q)}{\phi(q)} 			\left( x + O\left( \sqrt x (\log qx)^2 \right) \right) 			+ O\left( \sqrt x (\log x)^2 \right) \\ 		&= \frac{\mu(q)}{\phi(q)} x + O\left( \sqrt{qx} (\log qx)^2 \right) 	\end{aligned}

as desired. \Box

We then provide a bound when {\alpha} is “close to” such an {a/q}.

Lemma 14

Let {\gcd(a,q) = 1} and {\beta \in \mathbb T}. Assuming Theorem 9, we have

\displaystyle  		S(x, a/q + \beta) = 		\frac{\mu(q)}{\phi(q)} \left( \sum_{n \le x} e(\beta n) \right) 		+ O\left( (1+\|\beta\|x) \sqrt{qx} (\log qx)^2 \right).

Proof: For convenience let us assume {x \in \mathbb Z}. Let {\alpha = a/q + \beta}. Let us denote {\text{Err}(x, \alpha) 		= S(x,\alpha) - \frac{\mu(q)}{\phi(q)} x}, so by Lemma 13 we have {\text{Err}(x,\alpha) \ll \sqrt{qx}(\log x)^2}. We have

\displaystyle  \begin{aligned} 		S(x, \alpha) &= \sum_{n \le x} \Lambda(n) e(na/q) e(n\beta) \\ 		&= \sum_{n \le x} e(n\beta) \left( S(n, a/q) - S(n-1, a/q) \right) \\ 		&= \sum_{n \le x} e(n\beta) \left( 			\frac{\mu(q)}{\phi(q)} 			+ \text{Err}(n, \alpha) - \text{Err}(n-1, \alpha) \right) \\ 		&= \frac{\mu(q)}{\phi(q)} \left( \sum_{n \le x} e(n\beta) \right) 			+ \sum_{1 \le m \le x-1} \left( e( (m+1)\beta) - e( m\beta ) \right) 			\text{Err}(m, \alpha) \\ 		&\qquad + e(x\beta) \text{Err}(x, \alpha) - e(0) \text{Err}(0, \alpha) \\ 		&\le \frac{\mu(q)}{\phi(q)} \left( \sum_{n \le x} e(n\beta) \right) 			+ \left( \sum_{1 \le m \le x-1} \|\beta\| \text{Err}(m, \alpha) \right) 			+ \text{Err}(0, \alpha) + \text{Err}(x, \alpha) \\ 		&\ll \frac{\mu(q)}{\phi(q)} \left( \sum_{n \le x} e(n\beta) \right) 			+ \left( 1+x\left\| \beta \right\| \right) 			O\left( \sqrt{qx} (\log qx)^2 \right) 	\end{aligned}

as desired. \Box

Thus if {\alpha} is close to a fraction with small denominator, the value of {S(x, \alpha)} is bounded above. We can now combine this with the Dirichlet approximation theorem to obtain the following general result.

Corollary 15

Suppose {M = N^{2/3}} and suppose {\left\lvert \alpha - a/q \right\rvert \le \frac{1}{qM}} for some {\gcd(a,q) = 1} with {q \le M}. Assuming Theorem 9, we have

\displaystyle  S(x, \alpha) \ll \frac{x}{\varphi(q)} + x^{\frac56+\varepsilon}

for any {\varepsilon > 0}.

Proof: Apply Lemma 14 directly. \Box

5. Estimation of the arcs

We’ll write

\displaystyle  f(\alpha) \overset{\text{def}}{=} S(N,\alpha)=\sum_{n \le N} \Lambda(n)e(n\alpha)

for brevity in this section.

Recall that we wish to bound the right-hand side of (2) in Proposition 5. We split {[0,1]} into two sets, which we call the “major arcs” and the “minor arcs.” To do so, we use Dirichlet approximation, as hinted at earlier.

In what follows, fix

\displaystyle  \begin{aligned} 	M &= N^{2/3} \\ 	K &= (\log N)^{10}. \end{aligned}

5.1. Setting up the arcs

Definition 16

For {q \le K} and {\gcd(a,q) = 1}, {1 \le a \le q}, we define

\displaystyle  		\mathfrak M(a,q) = \left\{ \alpha \in \mathbb T 		\mid \left\lvert \alpha - \frac aq \right\rvert \le \frac 1M \right\}.

These will be the major arcs. The union of all major arcs is denoted by {\mathfrak M}. The complement is denoted by {\mathfrak m}.

Equivalently, for any {\alpha}, consider {q = q(\alpha) \le M} as in Theorem 12. Then {\alpha \in \mathfrak M} if {q \le K} and {\alpha \in \mathfrak m} otherwise.

Proposition 17

{\mathfrak M} is composed of finitely many disjoint intervals {\mathfrak M(a,q)} with {q \le K}. The complement {\mathfrak m} is nonempty.

Proof: Note that if {q_1, q_2 \le K} and {a/q_1 \neq b/q_2} then {\left\lvert \frac{a}{q_1} - \frac{b}{q_2} \right\rvert 	\ge \frac{1}{q_1q_2} \gg \frac{3}{qM}}. \Box

In particular both {\mathfrak M} and {\mathfrak m} are measurable. Thus we may split the integral in (2) over {\mathfrak M} and {\mathfrak m}. This integral will have large magnitude on the major arcs, and small magnitude on the minor arcs, so overall the whole interval {[0,1]} it will have large magnitude.

5.2. Estimate of the minor arcs

First, we note the well known fact {\phi(q) \gg q/\log q}. Note also that if {q=q(\alpha)} as in the last section and {\alpha} is on a minor arc, we have {q > (\log N)^{10}}, and thus {\phi(q) \gg (\log N)^{9}}.

As such Corollary 3.3 yields that {f(\alpha) \ll \frac{N}{\phi(q)}+N^{.834} \ll \frac{N}{(\log N)^9}}.

Now,

\displaystyle  \begin{aligned} 	\left\lvert \int_{\mathfrak m}f(\alpha)^3e(-N\alpha) \; d\alpha \right\rvert 	&\le \int_{\mathfrak m}\left\lvert f(\alpha)\right\rvert ^3 \; d\alpha \\ 	&\ll \frac{N}{(\log N)^9} \int_{0}^{1}\left\lvert f(\alpha)\right\rvert ^2 \;d\alpha \\ 	&=\frac{N}{(\log N)^9}\int_{0}^{1}f(\alpha)f(-\alpha) \; d\alpha \\ 	&=\frac{N}{(\log N)^9}\sum_{n \le N} \Lambda(n)^2 \\ 	&\ll \frac{N^2}{(\log N)^8}, \end{aligned}

using the well known bound {\sum_{n \le N} \Lambda(n)^2 \ll \frac{N}{\log N}}. This bound of {\frac{N^2}{(\log N)^8}} will be negligible compared to lower bounds for the major arcs in the next section.

5.3. Estimate on the major arcs

We show that

\displaystyle  \int_{\mathfrak M}f(\alpha)^3e(-N\alpha) d\alpha \asymp \frac{N^2}{2} \mathfrak G(N).

By Proposition 17 we can split the integral over each interval and write

\displaystyle  \int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha 	= \sum_{q \le (\log N)^{10}}\sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} 	\int_{-1/qM}^{1/qM}f(a/q+\beta)^3e(-N(a/q+\beta)) \; d\beta.

Then we apply Lemma 14, which gives

\displaystyle  \begin{aligned} 	f(a/q+\beta)^3 	&= \left(\frac{\mu(q)}{\phi(q)}\sum_{n \le N}e(\beta n) \right)^3 \\ 	&+\left(\frac{\mu(q)}{\phi(q)}\sum_{n \le N}e(\beta n)\right)^2 		O\left((1+\|\beta\|N)\sqrt{qN} \log^2 qN\right) \\ 	&+\left(\frac{\mu(q)}{\phi(q)}\sum_{n \le N}e(\beta n)\right) 		O\left((1+\|\beta\|N)\sqrt{qN} \log^2 qN\right)^2 \\ 	&+O\left((1+\|\beta\|N)\sqrt{qN} \log^2 qN\right)^3. \end{aligned}

Now, we can do casework on the side of {N^{-.9}} that {\|\beta\|} lies on.

  • If {\|\beta\| \gg N^{-.9}}, we have {\sum_{n \le N}e(\beta n) \ll \frac{2}{|e(\beta)-1|} 	\ll \frac{1}{\|\beta\|} \ll N^{.9}}, and {(1+\|\beta\|N)\sqrt{qN} \log^2 qN \ll N^{5/6+\varepsilon}}, because certainly we have {\|\beta\|<1/M=N^{-2/3}}.
  • If on the other hand {\|\beta\|\ll N^{-.9}}, we have {\sum_{n \le N}e(\beta n) \ll N} obviously, and {O(1+\|\beta\|N)\sqrt{qN} \log^2 qN) \ll N^{3/5+\varepsilon}}.

As such, we obtain

\displaystyle  f(a/q+\beta)^3 \ll \left( \frac{\mu(q)}{\phi(q)}\sum_{n \le N}e(\beta n) \right)^3 	+ O\left(N^{79/30+\varepsilon}\right)

in either case. Thus, we can write

\displaystyle  \begin{aligned} 	&\qquad \int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha \\ 	&= \sum_{q \le (\log N)^{10}} \sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} 	\int_{-1/qM}^{1/qM} f(a/q+\beta)^3e(-N(a/q+\beta)) \; d\beta \\ 	&= \sum_{q \le (\log N)^{10}} \sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} 		\int_{-1/qM}^{1/qM}\left[\left(\frac{\mu(q)}{\phi(q)}\sum_{n \le N}e(\beta n)\right)^3 		+ O\left(N^{79/30+\varepsilon}\right)\right]e(-N(a/q+\beta)) \; d\beta \\ 	&=\sum_{q \le (\log N)^{10}} \frac{\mu(q)}{\phi(q)^3} S_q 		\left(\sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} e(-N(a/q))\right) 		\left( \int_{-1/qM}^{1/qM}\left(\sum_{n \le N}e(\beta n)\right)^3e(-N\beta) 		\; d\beta \right ) \\ 	&\qquad +O\left(N^{59/30+\varepsilon}\right). \end{aligned}

just using {M \le N^{2/3}}. Now, we use

\displaystyle  \sum_{n \le N}e(\beta n) = \frac{1-e(\beta N)}{1-e(\beta)} 	\ll \frac{1}{\|\beta\|}.

This enables us to bound the expression

\displaystyle  \int_{1/qM}^{1-1/qM}\left (\sum_{n \le N}e(\beta n)\right) ^ 3 e(-N\beta)d\beta 	\ll \int_{1/qM}^{1-1/qM}\|\beta\|^{-3} d\beta = 2\int_{1/qM}^{1/2}\beta^{-3} d\beta 	\ll q^2M^2.

But the integral over the entire interval is

\displaystyle  \begin{aligned} 	\int_{0}^{1}\left(\sum_{n \le N}e(\beta n) \right)^3 e(-N\beta)d\beta 	&= \int_{0}^{1} \sum_{a,b,c \le N} e((a+b+c-N)\beta) \\ 	&\ll \sum_{a,b,c \le N} \mathbf 1(a+b+c=N) \\ 	&= \binom{N-1}{2}. \end{aligned}

Considering the difference of the two integrals gives

\displaystyle  \int_{-1/qM}^{1/qM}\left(\sum_{n \le N}e(\beta n) \right)^3 	e(-N\beta) \; d\beta - \frac{N^2}{2} \ll q^2 M^2 + N 	\ll (\log N)^c N^{4/3},

for some absolute constant {c}.

For brevity, let

\displaystyle  S_q = \sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} e(-N(a/q)).

Then

\displaystyle  \begin{aligned} 	\int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha &= \sum_{q \le (\log N)^{10}} \frac{\mu(q)}{\phi(q)^3}S_q 		\left( \int_{-1/qM}^{1/qM}\left(\sum_{n \le N}e(\beta n)\right)^3e(-N\beta) 		\; d\beta \right ) \\ 	&\qquad +O\left(N^{59/30+\varepsilon}\right) \\ &= \frac{N^2}{2}\sum_{q \le (\log N)^{10}} 	\frac{\mu(q)}{\phi(q)^3}S_q + O((\log N)^{10+c} N^{4/3}) 		+ O(N^{59/30+\varepsilon}) \\ &= \frac{N^2}{2}\sum_{q \le (\log N)^{10}} \frac{\mu(q)}{\phi(q)^3} 		+ O(N^{59/30+\varepsilon}). \end{aligned}

.

The inner sum is bounded by {\phi(q)}. So,

\displaystyle \left\lvert \sum_{q>(\log N)^{10}} 	\frac{\mu(q)}{\phi(q)^3} S_q \right\rvert 	 \le \sum_{q>(\log N)^{10}} \frac{1}{\phi(q)^2},

which converges since {\phi(q)^2 \gg q^c} for some {c > 1}. So

\displaystyle  \int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha 	= \frac{N^2}{2}\sum_{q = 1}^\infty \frac{\mu(q)}{\phi(q)^3}S_q 	+ O(N^{59/30+\varepsilon}).

Now, since {\mu(q)}, {\phi(q)}, and {\sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} e(-N(a/q))} are multiplicative functions of {q}, and {\mu(q)=0} unless {q} is squarefree,

\displaystyle  \begin{aligned} \sum_{q = 1}^\infty \frac{\mu(q)}{\phi(q)^3} S_q 	&= \prod_p \left(1+\frac{\mu(p)}{\phi(p)^3}S_p \right) \\ 	&= \prod_p \left(1-\frac{1}{(p-1)^3} 		\sum_{a=1}^{p-1} e(-N(a/p))\right) \\ 	&= \prod_p \left(1-\frac{1}{(p-1)^3}\sum_{a=1}^{p-1} 		(p\cdot \mathbf 1(p|N) - 1)\right) \\ 	&= \prod_{p|N}\left(1-\frac{1}{(p-1)^2}\right) 		\prod_{p \nmid N}\left(1+\frac{1}{(p-1)^3}\right) \\ 	&= \mathfrak G(N). \end{aligned}

So,

\displaystyle \int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha = \frac{N^2}{2}\mathfrak{G}(N) + O(N^{59/30+\varepsilon}).

When {N} is odd,

\displaystyle  \mathfrak{G}(N) = \prod_{p|N}\left(1-\frac{1}{(p-1)^2}\right)\prod_{p \nmid N}\left(1+\frac{1}{(p-1)^3}\right)\geq \prod_{m\geq 3}\left(\frac{m-2}{m-1}\frac{m}{m-1}\right)=\frac{1}{2},

so that we have

\displaystyle \int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha \asymp \frac{N^2}{2}\mathfrak{G}(N),

as desired.

6. Completing the proof

Because the integral over the minor arc is {o(N^2)}, it follows that

\displaystyle \sum_{a+b+c=N} \Lambda(a)\Lambda(b)\Lambda(c) = \int_{0}^{1} f(\alpha)^3 e(-N\alpha) d \alpha \asymp \frac{N^2}{2}\mathfrak{G}(N) \gg N^2.

Consider the set {S_N} of integers {p^k\leq N} with {k>1}. We must have {p \le N^{\frac{1}{2}}}, and for each such {p} there are at most {O(\log N)} possible values of {k}. As such, {|S_N| \ll\pi(N^{1/2}) \log N\ll N^{1/2}}.

Thus

\displaystyle \sum_{\substack{a+b+c=N \\ a\in S_N}} \Lambda(a)\Lambda(b)\Lambda(c) \ll (\log N)^3 |S|N \ll\log(N)^3 N^{3/2},

and similarly for {b\in S_N} and {c\in S_N}. Notice that summing over {a\in S_N} is equivalent to summing over composite {a}, so

\displaystyle  \sum_{p_1+p_2+p_3=N} \Lambda(p_1)\Lambda(p_2)\Lambda(p_3) 	=\sum_{a+b+c=N} \Lambda(a)\Lambda(b)\Lambda(c) + O(\log(N)^3 N^{3/2}) 	\gg N^2,

where the sum is over primes {p_i}. This finishes the proof.

18.099 Transcript: Bourgain’s Theorem

As part of the 18.099 Discrete Analysis reading group at MIT, I presented section 4.7 of Tao-Vu’s Additive Combinatorics textbook. Here were the notes I used for the second half of my presentation.

1. Synopsis

We aim to prove the following result.

Theorem 1 (Bourgain)

Assume {N \ge 2} is prime and {A, B \subseteq Z = \mathbb Z_N}. Assume that

\displaystyle  \delta \gg \sqrt{\frac{(\log \log N)^3}{\log N}}

is such that {\min\left\{ \mathbf P_ZA, \mathbf P_ZB \right\} \ge \delta}. Then {A+B} contains a proper arithmetic progression of length at least

\displaystyle  \exp\left( C\sqrt[3]{\delta^2 \log N} \right)

for some absolute constant {C > 1}.

The methods that we used with Bohr sets fail here, because in the previous half of yesterday’s lecture we took advantage of Parseval’s identity in order to handle large convolutions, always keeping two {\widehat 1_\ast} term’s inside the {\sum} sign. When we work with {A+B} this causes us to be stuck. So, we instead use the technology of {\Lambda(p)} constants and dissociated sets.

2. Previous results

As usual, let {Z} denote a finite abelian group. Recall that

Definition 2

Let {S \subseteq Z} and {2 \le p \le \infty}. The {\Lambda(p)} constant of {S}, denoted {\left\lVert S \right\rVert_{\Lambda(p)}}, is defined as

\displaystyle  \left\lVert S \right\rVert_{\Lambda(p)} = \sup_{\substack{c : S \rightarrow \mathbb C \\ c \not\equiv 0}} \frac{\left\lVert \displaystyle\sum_{\xi \in S} c(\xi) e(\xi \cdot x) \right\rVert_{L^p(Z)}} {\left\lVert c \right\rVert_{\ell^2(S)}}.

Definition 3

If {S \subseteq Z}, we say {S} is a dissociated set if all {2^{|S|}} subset sums of {S} are distinct.

For such sets we have the Rudin’s inequality (yes, Walter) which states that

Lemma 4 (Rudin’s inequality)

If {S} is dissociated then

\displaystyle  \left\lVert S \right\rVert_{\Lambda(p)} \ll \sqrt p.

Disassociated sets come up via the so-called “cube covering lemma”:

Lemma 5 (Cube covering lemma)

Let {S \subseteq Z} and {d \ge 1}. Then we can partition

\displaystyle  S = D_1 \sqcup D_2 \sqcup \dots \sqcup D_k \sqcup R

such that

  • Each {D_i} is dissociated of size {d+1},
  • There exists {\eta_1}, {\dots}, {\eta_d} such that {R} is contained in a {d}-cube, i.e. it’s covered by {c_1\eta_1 + \dots + c_d\eta_d}, where {c_i \in \{-1,0,1\}}.

Finally, we remind the reader that

Lemma 6 (Parseval)

We have

\displaystyle  \left\lVert f \right\rVert_{L^2Z} = \left\lVert \widehat f \right\rVert_{\ell^2Z}.

Since we don’t have Bohr sets anymore, the way we detect progressions is to use the pigeonhole principle. In what follows, let {T^n f} be the shift of {x} by {n}, id est {T^nf(x) = f(x-n)}.

Proposition 7 (Pigeonhole gives arithmetic progressions)

Let {f : Z \rightarrow \mathbb R_{\ge 0}}, {J \ge 1} and suppose {r \in \mathbb Z} is such that

\displaystyle  \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr}f - f \right\rvert < \mathbf E_Z f.

Then {\text{supp }(f)} contains an arithmetic of length {j} and spacing {r}.

Proof: Apply the pigeonhole principle to find an {x} such that

\displaystyle  \max_{1 \le j \le J} \left\lvert T^{jr}f(x) - f(x) \right\rvert < f(x).

Then the claim follows. \Box

3. Periodicity

Proposition 8 (Estimate for {\max_{h \in H} |T^hf|} for {\text{supp }(\widehat f)} dissociated)

Let {f : Z \rightarrow \mathbb R}, {\text{supp }(\widehat f) \subseteq S \subseteq Z} with {S} dissociated. Then for any set {H} with {|H| > 1} we have

\displaystyle  \left\lVert \max_{h \in H} \left\lvert T^h f \right\rvert \right\rVert_{L^2Z} \ll \sqrt{\log|H|} \left\lVert f \right\rVert_{L^2Z}.

Proof: Let {p > 2} be large and note

\displaystyle  \begin{aligned} \left\lVert \max_{h \in H} \left\lvert T^h f \right\rvert \right\rVert_{L^2Z} &\le \left\lVert \max_{h \in H} \left\lvert T^h f \right\rvert \right\rVert_{L^pZ} \\ &\le \left\lVert \left( \sum_{h \in H} \left\lvert T^h f \right\rvert^p \right)^{1/p} \right\rVert_{L^pZ} \\ &= \left( \mathbf E_Z \left( \sum_{h \in H} \left\lvert T^h f \right\rvert^p \right) \right)^{1/p} \\ &= \left( \sum_{h \in H} \mathbf E_Z \left\lvert T^h f \right\rvert^p \right)^{1/p} \\ &= \left( \sum_{h \in H} \mathbf E_Z \left\lvert f \right\rvert^p \right)^{1/p} \\ &= \left\lvert H \right\rvert^{1/p} \left\lVert \sum_\xi \widehat f(\xi) e(\xi \cdot x) \right\rVert_{L^pZ} \\ &\le \left\lvert H \right\rvert^{1/p} \left\lVert S \right\rVert_{\Lambda(p)} \left\lVert \widehat f \right\rVert_{\ell^2Z} \\ \end{aligned}

Then by Parseval and Rudin,

\displaystyle  \begin{aligned} \left\lVert \max_{h \in H} \left\lvert T^h f \right\rvert \right\rVert_{L^2Z} &\le \left\lvert H \right\rvert^{1/p} \left\lVert S \right\rVert_{\Lambda(p)} \left\lVert f \right\rVert_{L^2Z} \\ &\ll \left\lvert H \right\rvert^{1/p} \sqrt p \left\lVert f \right\rVert_{L^2Z}. \end{aligned}

We may then take {p \ll \log H}. \Box

We combine these two propositions into the following lemma which applies if {\widehat f} has nonzero values of “uniform” size.

Lemma 9 (Uniformity estimate for shifts)

Let {f : Z \rightarrow \mathbb R} and {J, d > 1}. Suppose that {\widehat f} is “uniform in size” across its support, in the sense that

\displaystyle  \frac {\sup_{\xi \in \text{supp }(\widehat f)} \left\lvert \widehat f(\xi) \right\rvert} {\inf_{\xi \in \text{supp }(\widehat f)} \left\lvert \widehat f(\xi) \right\rvert} \le 2016.

Then one can find {S \subseteq Z} such that {|S| = d} and for all {r \in Z},

\displaystyle  \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr}f - f \right\rvert \ll \left( \sum_\xi \left\lvert \widehat f(\xi) \right\rvert \right) \left( \sqrt{\frac{\log J}{d}} + Jd\max_{\eta \in S} \left\lVert \eta \cdot r \right\rVert_{\mathbb R/\mathbb Z} \right).

Proof: Use the cube covering lemma to put {\text{supp }(\widehat f) = D_1 \sqcup \dots \sqcup D_k \sqcup R} where {R} is contained in the cube of {S = \left\{ \eta_1, \dots, \eta_d \right\}} and {|D_i| = d+1} for {1 \le i \le k}. Accordingly, we decompose {f} over its Fourier transform as

\displaystyle  f = f_1 + \dots + f_k + g

by letting {f_i} be supported on {D_i} and {g(x)} supported on {R}.

First, we can bound the “leftover” bits in {R}:

\displaystyle  \begin{aligned} \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr} g - g \right\rvert &= \mathbf E_Z \max_{0 \le j \le J} \sum_{\xi \in R} \left\lvert \widehat f(\xi) \cdot (e(\xi \cdot (x+jr)) - e(\xi \cdot x)) \right\rvert \\ &\le \mathbf E_Z \max_{0 \le j \le J} \sum_{\xi \in R} \left\lvert \widehat f(\xi) \right\rvert \left\lvert (e(\xi \cdot (x+jr)) - e(\xi \cdot x)) \right\rvert \\ &\le \left( \sum_{\xi \in R} \left\lvert \widehat f(\xi) \right\rvert \right) \max_{\substack{0 \le j \le J \\ \xi \in R}} \left\lvert (e(\xi \cdot (x+jr)) - e(\xi \cdot x)) \right\rvert \\ &\le \left( \sum_{\xi \in R} \left\lvert \widehat f(\xi) \right\rvert \right) \max_{\substack{0 \le j \le J \\ \xi \in R}} \left\lvert e(\xi \cdot jr) - 1 \right\rvert \\ &\le \left( \sum_{\xi \in R} \left\lvert \widehat f(\xi) \right\rvert \right) 2\pi \max_{\substack{0 \le j \le J \\ \xi \in R}} \left\lVert \xi \cdot jr \right\rVert_{\mathbb R/\mathbb Z} \end{aligned}

Since the {\xi \in R} are covered by a cube of {S = \left\{ \eta_1, \dots, \eta_d \right\}}, we get

\displaystyle  \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr} g - g \right\rvert \le \left( \sum_{\xi \in R} \left\lvert \widehat f(\xi) \right\rvert \right) 2\pi Jd \max_{\substack{0 \le j \le J \\ \eta \in S}} \left\lVert \eta \cdot jr \right\rVert_{\mathbb R/\mathbb Z}.

Let’s then bound the contribution over each dissociated set. We’ll need both the assumption of uniformity and the proposition we proved for dissociated sets.

\displaystyle  \begin{aligned} \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr} f_i - f_i \right\rvert &\le 2\mathbf E_Z \max_{0 \le j \le J} \left\lvert T^{jr} f_i \right\rvert \\ &\le 2\left\lVert \max_{0 \le j \le J} \left\lvert T^{jr} f_i \right\rvert \right\rVert_{L^2Z}. \\ &\ll \sqrt{\log(J)} \left\lVert f_i \right\rVert_{L^2Z} \\ &= \sqrt{\log(J)} \sqrt{\sum_{\xi \in D_i} \left\lvert \widehat f(\xi) \right\rvert^2 } \\ &\ll \sqrt{\frac{\log J}{D}} \sum_{\xi \in D_i} \left\lvert \widehat f(\xi) \right\rvert \end{aligned}

where the last step is by uniformity of {\widehat \xi}. Now combine everything with triangle inequality. \Box

4. Proof of main theorem

Without loss of generality {\mathbf P_ZA = \mathbf P_ZB = \delta}. Of course, we let {f = 1_A \ast 1_B} so {\mathbf E_Z f = \delta^2}. We will have parameters {d \ge 1}, {M \ge 1}, and {J \ge \exp(C\sqrt[3]{\delta^2 \log N})} which we will select at the end.

Our goal is to show there exists some integer {r} such that

\displaystyle  \mathbf E_Z \max_{1 \le j < J} \left\lvert T^{jr} f - f \right\rvert < \delta^2.

Now we cannot apply the uniformity estimate directly since {f} is probably not uniform, and therefore we impose a dyadic decomposition on the base group {Z}; let

\displaystyle  \begin{aligned} Z_0 &= \left\{ \xi \in Z \;:\; \frac{1}{2} \delta^2 < \left\lvert \widehat f(\xi) \right\rvert \le \delta^2 \right\} \\ Z_1 &= \left\{ \xi \in Z \;:\; \frac14\delta^2 < \left\lvert \widehat f(\xi) \right\rvert \le \frac{1}{2}\delta^2 \right\} \\ Z_2 &= \left\{ \xi \in Z \;:\; \frac18\delta^2 < \left\lvert \widehat f(\xi) \right\rvert \le \frac14\delta^2 \right\} \\ &\vdots \\ Z_{M-1} &= \left\{ \xi \in Z \;:\; 2^{-M} \delta^2 < \left\lvert \widehat f(\xi) \right\rvert \le 2^{-M+1} \delta^2 \right\} \\ Z_{\mathrm{err}} &= \left\{ \xi \in Z \;:\; \left\lvert \widehat f(\xi) \right\rvert < 2^{-M} \delta^2 \right\} \\ \end{aligned}

Then as before we can decompose via Fourier transform to obtain

\displaystyle  f = f_0 + f_1 + \dots + f_{M-1} + f_{\mathrm{err}}

so that {\widehat f_i} is supported on {Z_i}.

Now we can apply the previous lemma to get for each {0 \le m < M}:

\displaystyle  \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr} f_m - f_m \right\rvert \ll \left( \sum_{\xi \in Z_m} \left\lvert \widehat f(\xi) \right\rvert \right) \left( \sqrt{\frac{\log J}{d}} + Jd\max_{\eta \in S_m} \left\lVert \eta \cdot r \right\rVert_{\mathbb R/\mathbb Z} \right)

for some {S_m}; hence by summing and using the fact that

\displaystyle  \sum_{\xi \in Z} \left\lvert \widehat f(\xi) \right\rvert = \sum_{\xi \in Z} \left\lvert \widehat 1_A(\xi) \right\rvert \left\lvert \widehat 1_B(\xi) \right\rvert \le \left\lVert \widehat 1_A \right\rVert_{\ell^2Z} \left\lVert \widehat 1_B \right\rVert_{\ell^2Z} = \left\lVert 1_A \right\rVert_{L^2Z} \left\lVert 1_B \right\rVert_{L^2Z} = \sqrt{\mathbf P_ZA \mathbf P_ZB} = \delta

we obtain that

\displaystyle  \sum_{0 \le m < M} \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr} f - f \right\rvert \ll \delta \left( \sqrt{\frac{\log J}{d}} + Jd\max_{\eta \in \bigcup S_m} \left\lVert \eta \cdot r \right\rVert_{\mathbb R/\mathbb Z} \right).

As for the “error” term, we bound

\displaystyle  \begin{aligned} \mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr} f_{\mathrm{err}} - f_{\mathrm{err}} \right\rvert &\le 2\mathbf E_Z \max_{1 \le j \le J} \left\lvert T^{jr} f_{\mathrm{err}} \right\rvert \\ &\le 2\mathbf E_Z \sum_{1 \le j \le J} \left\lvert T^{jr} f_{\mathrm{err}} \right\rvert \\ &\le 2\sum_{1 \le j \le J} \mathbf E_Z \left\lvert T^{jr} f_{\mathrm{err}} \right\rvert \\ &\le 2\sum_{1 \le j \le J} \mathbf E_Z \left\lvert f_{\mathrm{err}} \right\rvert \\ &\le 2J \mathbf E_Z \left\lvert f_{\mathrm{err}} \right\rvert \\ &\le 2J \left\lVert f_{\mathrm{err}} \right\rVert_{L^2Z} \\ &= 2J \left\lVert \widehat f_{\mathrm{err}} \right\rVert_{\ell^2 Z} \\ &= 2J \sqrt{\sum_{\xi \in Z_{\mathrm{err}}} \left\lvert \widehat f_{\mathrm{err}}(\xi) \right\rvert^2} \\ &\le 2J \sqrt{\max_{\xi \in Z_{\mathrm{err}}} \left\lvert \widehat f_{\mathrm{err}}(\xi) \right\rvert \sum_{\xi \in Z_{\mathrm{err}}} \left\lvert \widehat f_{\mathrm{err}}(\xi) \right\rvert} \\ &\le 2J \sqrt{2^{-M}\delta^2 \cdot \delta} \\ &= 2J 2^{-M/2} \delta^{3/2} \\ &\le 2J 2^{-M/2} \delta. \end{aligned}

Thus, putting these altogether we need to find {R \neq 0} such that

\displaystyle  \sqrt{\frac{\log J}{d}} + Jd \max_{\eta\in\bigcup S_m} \left\lVert \eta \cdot r \right\rVert_{\mathbb R/\mathbb Z} + 2J \cdot2^{-M/2} \ll \delta.

Now set {M \asymp \log J} and {d \asymp \delta^{-2} \log J}, so the first and third terms are less than {\frac13 c \delta}, since by hypothesis

\displaystyle  \delta \gg \sqrt{\frac{(\log \log N)^3}{\log N}}

from which we deduce

\displaystyle  J \gg \exp\left( C\sqrt[3]{\delta^2\log N} \right) = \exp\left( C\log \log N \right) \ge (\log N)^C \gg \delta^{-1}.

Thus it suffices that

\displaystyle  \max_{\eta\in S} \left\lVert \eta \cdot r \right\rVert_{\mathbb R/\mathbb Z} \ll \frac{\delta^3}{J \log J}

where {S = \bigcup S_m}. Note {\left\lvert S \right\rvert \le dM \ll \left( \frac{\log J}{\delta} \right)^2}. Now we recall the result that

\displaystyle  \text{Bohr }(S, \rho) \ge |Z| \rho^{|S|}

and so it suffices for us that

\displaystyle  N \cdot \left( \frac{c_1 \delta^3}{J \log J} \right) ^{c_2 \left( \delta^{-1} \log J \right)^2} > 1

for constants {c_1} and {c_2}. Then {J = \exp(C\sqrt[3]{\delta^2 \log N})} works now.

18.099 Transcript: Chang’s Theorem

As part of the 18.099 discrete analysis reading group at MIT, I presented section 4.7 of Tao-Vu’s Additive Combinatorics textbook. Here were the notes I used for the first part of my presentation.

1. Synopsis

In the previous few lectures we’ve worked hard at developing the notion of characters, Bohr sets, spectrums. Today we put this all together to prove some Szemerédi-style results on arithmetic progressions of {\mathbb Z_N}.

Recall that Szemerédi’s Theorem states that:

Theorem 1 (Szemerédi)

Let {k \ge 3} be an integer. Then for sufficiently large {N}, any subset of {\{1, \dots, N\}} with density at least

\displaystyle  \frac{1}{(\log \log N)^{2^{-2^k+9}}}

contains a length {k} arithmetic progression.

Notice that the density approaches zero as {N \rightarrow \infty} but it does so extremely slowly.

Our goal is to show much better results for sets like {2A-2A}, {A+B+C} or {A+B}. In this post we will prove:

Theorem 2 (Chang’s Theorem)

Let {K,N \ge 1} and let {A \subseteq Z = \mathbb Z_N}. Suppose {E(A,A) \ge |A|^3 / K}, and let

\displaystyle  d \ll K\left( 1+\log \frac{1}{\mathbf P_Z A} \right).

Then there is a proper symmetric progression {P \subseteq 2A-2A} of rank at most {d} and density

\displaystyle  \mathbf P_Z P \ge d^{-d}.

One can pick {K} such that for example {|A \pm A| \le k|A|}, i.e. if {A} has small Ruzsa diameter. Or one can pick {K = 1/\mathbf P_Z A} always, but then {d} becomes quite large.

We also prove that

Theorem 3

Let {K,N \ge 1} and let {A, B, C \subseteq Z = \mathbb Z_N}. Suppose {|A|=|B|=|C| \ge \frac{1}{K}|A+B+C|} and now let

\displaystyle  d \ll K^2\left( 1+\log \frac{1}{\mathbf P_Z A} \right).

Then there is a proper symmetric progression {P \subseteq A+B+C} of rank at most {d} and

\displaystyle  \mathbf P_Z P \ge d^{-d}.

2. Main steps

Our strategy will take the following form. Let {S} be the set we want to study (for us, {S=2A-2A} or {S=A+B+C}). Then our strategy will take the following four steps.

Step 1. Analyze the Fourier coefficients of {\widehat 1_S}. Note in particular the identities

\displaystyle  \begin{aligned} \left\lVert \widehat 1_A \right\rVert_{\ell^\infty(Z)} &= \mathbf P_Z A \\ \left\lVert \widehat 1_A \right\rVert_{\ell^2(Z)} &= \sqrt{\mathbf P_Z A} \\ \left\lVert \widehat 1_A \right\rVert_{\ell^4(Z)} &= \frac{E(A,A)}{|Z|^3}. \end{aligned}

Recall also from the first section of Chapter 4 that

  • The support of {1_A \ast 1_B} is {A+B}.
  • {\widehat{f \ast g} = \widehat f \cdot \widehat g}.
  • {f(x) = \sum_\xi \widehat f(\xi) e(\xi \cdot x)}.

Step 2. Find a set of the form {\text{Bohr }(\text{Spec }_\alpha A, \rho)} contained completely inside {S}. Recall that by expanding definitions:

\displaystyle  \text{Bohr }(\text{Spec }_\alpha A, \rho) = \left\{ x \in Z \mid \sup_{\xi \; : \; \widehat 1_A(\xi) \ge \alpha \mathbf P_ZA} \left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z} < \rho \right\}.

Step 3. Use the triangle inequality and the Fourier concentration lemma (covering). Recall that this says:

Lemma 4 (Fourier Concentration, or “Covering Lemma”, Tao-Vu 4.36)

Let {A \subseteq Z}, and let {0 < \alpha \le 1}. Then one can pick {\eta_1}, \dots, {\eta_d} such that

\displaystyle  d \ll \frac{1 + \log \frac{1}{\mathbf P_ZA}}{\alpha^2}

and {\text{Spec }_\alpha A} is contained in a {d}-cube, i.e. it’s covered by {c_1\eta_1 + \dots + c_d\eta_d} where {c_i \in \{-1,0,1\}}.

Using such a {d}, we have by the triangle inequality

\displaystyle  \text{Bohr }\left(\{\eta_1, \dots, \eta_d\}, \frac{\rho}{d} \right) \subseteq \text{Bohr }\left( \text{Spec }_\alpha A, \rho \right).  \ \ \ \ \ (1)

Step 4. We use the fact that Bohr sets contain long arithmetic progressions:

Theorem 5 (Bohr sets have long coset progressions, Tao-Vu 4.23)

Let {Z = \mathbb Z_N}. Then within {\text{Bohr }(S, r)} one can select a proper symmetric progression {P} such that

\displaystyle  \mathbf P_Z P \ge \left( \frac{r}{|S|} \right)^{|S|}

and {\text{rank } P \le |S|}.

The third step is necessary because in the bound for the preceding theorem, the dependence on {|S|} is much more severe than the dependence on {r}. Therefore it is necessary to use the Fourier concentration lemma in order to reduce the size of {|S|} before applying the result.

3. Proof of Chang’s theorem

First, we do the first two steps in the following proposition.

Proposition 6

Let {A \subseteq Z}, {0 < \alpha \le 1}. Assume {E(A,A) \ge 4\alpha^2 |A|^3}, Then

\displaystyle  \text{Bohr }\left(\text{Spec }_\alpha A, \frac 16\right) \subseteq 2A-2A.

Proof: To do this, as advertised consider

\displaystyle  f = 1_A \ast 1_A \ast 1_{-A} \ast 1_{-A}(x).

We want to show that any {x \in \text{Bohr }(\text{Spec }_\alpha A, \frac 16)} lies in the support of {f}. Note that if {x} does lie in this Bohr set, we have

\displaystyle  \text{Re } e(\xi \cdot x) \ge \frac{1}{2} \qquad \forall \xi \in \text{Spec }_\alpha A.

We aim to show now {f(x) > 0}. This follows by computing

\displaystyle  \begin{aligned} f(x) &= 1_A \ast 1_A \ast 1_{-A} \ast 1_{-A}(x) \\ &= \sum_\xi \widehat 1_A(\xi)^2 \widehat 1_{-A}(\xi)^2 e(\xi \cdot x) \\ &= \sum_\xi |\widehat 1_A(\xi)|^4 e(\xi \cdot x) \end{aligned}

Now we split the sum over {\text{Spec }_\alpha A}:

\displaystyle  \begin{aligned} f(x) &= \sum_{\xi \in \text{Spec }_\alpha(A)} |\widehat 1_A(\xi)|^4 e(\xi \cdot x) + \sum_{\xi \notin \text{Spec }_\alpha(A)} |\widehat 1_A(\xi)|^4 e(\xi \cdot x). \end{aligned}

Now we take the real part of both sides:

\displaystyle  \begin{aligned} \text{Re } f(x) &\ge \sum_{\xi \in \text{Spec }_\alpha(A)} |\widehat 1_A(\xi)|^4 \cdot \frac{1}{2} - \sum_{\xi \notin \text{Spec }_\alpha(A)} |\widehat 1_A(\xi)|^4 \\ &= \frac{1}{2} \sum_{\xi} |\widehat 1_A(\xi)|^4 - \frac32 \sum_{\xi \notin \text{Spec }_\alpha(A)} |\widehat 1_A(\xi)|^4 \\ &= \frac{1}{2} \frac{E(A,A)}{|Z|^3} - \frac32 \sum_{\xi \notin \text{Spec }_\alpha(A)} |\widehat 1_A(\xi)|^4. \end{aligned}

By definition of {\text{Spec }_\alpha A} we can bound two of the {\left\lvert \widehat 1_A(\xi) \right\rvert}‘s via

\displaystyle  \begin{aligned} \text{Re } f(x) &\ge \frac{1}{2} \frac{E(A,A)}{|Z|^3} - \frac32 (\alpha\mathbf P_Z A)^2 \sum_{\xi \notin \text{Spec }_\alpha(A)} |\widehat 1_A(\xi)|^2 \end{aligned}

Now the last sum is the square of the {\ell^2} norm, hence

\displaystyle  \begin{aligned} \text{Re } f(x) &\ge \frac{1}{2} \frac{E(A,A)}{|Z|^3} - \frac32 (\alpha\mathbf P_Z A)^2 \cdot \mathbf P_ZA \\ &\ge \frac{1}{2} \frac{E(A,A)}{|Z|^3} - \frac32 \alpha^2 \frac{|A|^3}{|Z|^3} > 0 \end{aligned}

by the assumption {E(A,A) \ge 4\alpha^2 |A|^3}. \Box

Now, let {\alpha = \frac{1}{2\sqrt K}}, and let

\displaystyle  d \ll \frac{1 + \log \frac{1}{\mathbf P_Z A}}{\alpha^2} \ll K\left( 1 + \log \frac{1}{\mathbf P_Z A} \right).

Then by (1), we have

\displaystyle  \text{Bohr }\left(\{\eta_1, \dots, \eta_d\}, \frac{1}{6d} \right) \subseteq \text{Bohr }\left( \text{Spec }_\alpha A, \frac16 \right) 2A-2A.

and then using the main result on Bohr sets, we can find a symmetric progression of density at least

\displaystyle  \left( \frac{1/6d}{d} \right)^d = d^{-d}

and whose rank is at most {d}. This completes the proof of Chang’s theorem.

4. Proof of the second theorem

This time, the Bohr set we want to use is:

Proposition 7

Let {\alpha = \frac{1}{2\pi K}}. Then

\displaystyle  \text{Bohr }\left(\text{Spec }_\alpha A, \frac{1}{2\pi K}\right) \subseteq A+B+C.

Proof: Let {f = 1_A \ast 1_B \ast 1_C}. Note that we have {\mathbf P_Z(A+B+C) \le K\mathbf P_Z A}, while {\mathbf E_ZA = (\mathbf P_ZA)^3}. So by shifting {C}, we may assume without loss of generality that

\displaystyle  f(0) \ge \frac{(\mathbf P_ZA)^3}{K\mathbf P_ZA} \ge \frac{1}{K} (\mathbf P_ZA)^2.

Now, consider {x} in the Bohr set. Then we have

\displaystyle  \begin{aligned} \left\lvert f(x)-f(0) \right\rvert &= \left\lvert \sum_\xi \widehat1_A(\xi) \widehat1_B(\xi) \widehat1_C(\xi) \left( e(\xi \cdot x) - 1 \right) \right\rvert \\ &\le \sum_\xi \left\lvert \widehat 1_A(\xi) \right\rvert \left\lvert \widehat 1_B(\xi) \right\rvert \left\lvert \widehat 1_C(\xi) \right\rvert \left\lvert e(\xi \cdot x) - 1 \right\rvert \\ &\le 2\pi \sum_\xi \left\lvert \widehat 1_A(\xi) \right\rvert \left\lvert \widehat 1_B(\xi) \right\rvert \left\lvert \widehat 1_C(\xi) \right\rvert \left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z}. \end{aligned}

Bounding by the maximum for {A}, and then using Cauchy-Schwarz,

\displaystyle  \begin{aligned} \left\lvert f(x)-f(0) \right\rvert &\le 2\pi \sup_\xi \left( \left\lvert \widehat 1_A(\xi) \right\rvert \left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z} \right) \sum_\xi \left\lvert \widehat 1_B(\xi) \right\rvert \left\lvert \widehat 1_C(\xi) \right\rvert \\ &\le 2\pi \sup_\xi \left( \left\lvert \widehat 1_A(\xi) \right\rvert \left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z} \right) \sqrt{ \sum_\xi \left\lvert \widehat 1_B(\xi) \right\rvert^2 \sum_\xi \left\lvert \widehat 1_C(\xi) \right\rvert^2} \\ &\le 2\pi \mathbf P_Z A \cdot \sup_\xi \left( \left\lvert \widehat 1_A(\xi) \right\rvert \left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z} \right) \end{aligned}

Claim: if {x \in \text{Bohr }(\text{Spec }_\alpha A, \frac{1}{2\pi K})} and {\xi \in Z} then

\displaystyle  \left\lvert \widehat 1_A(\xi) \right\rvert \left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z} < \frac{1}{2\pi K} \mathbf P_ZA

Indeed one just considers two cases:

  • If {\xi \in \text{Spec }_\alpha A}, then {\left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z} < \alpha} ({x} in Bohr set) and {\left\lvert \widehat1_A(\xi) \right\rvert \le \mathbf P_ZA}.
  • If {\xi \notin \text{Spec }_\alpha A}, then {\left\lvert \widehat 1_A(\xi) \right\rvert < \alpha \mathbf P_ZA} ({\xi} outside Spec) and {\left\lVert \xi \cdot x \right\rVert_{\mathbb R/\mathbb Z} \le 1}.

So finally, we have

\displaystyle  \left\lvert f(x)-f(0) \right\rvert < 2\pi \mathbf P_Z A \cdot \sup_\xi \left( \left\lvert \widehat 1_A(\xi) \right\rvert \right) < \frac{(\mathbf P_ZA)^2}{K} \le f(0)

and this implies {f(x) \neq 0}. \Box

Once more by (1), we have

\displaystyle  \text{Bohr }\left(\{\eta_1, \dots, \eta_d\}, \frac{1}{2\pi Kd} \right) \subseteq \text{Bohr }\left( \text{Spec }_\alpha A, \frac{1}{2\pi K} \right) \subseteq A+B+C

where

\displaystyle  d \ll \frac{1+\log \frac{1}{\mathbf P_ZA}}{\alpha^2} \ll K^2\left( 1 + \log \frac{1}{\mathbf P_Z A} \right).

So there are the main theorem on Bohr sets again, there is a symmetric progression of density at least

\displaystyle  \left( \frac{\frac{1}{2\pi Kd}}{d} \right)^d \ll d^{-d}

and rank at most {d}. This completes the proof of the second theorem.