# Facts about Lie Groups and Algebras

In Spring 2016 I was taking 18.757 Representations of Lie Algebras. Since I knew next to nothing about either Lie groups or algebras, I was forced to quickly learn about their basic facts and properties. These are the notes that I wrote up accordingly. Proofs of most of these facts can be found in standard textbooks, for example Kirillov.

## 1. Lie groups

Let ${K = \mathbb R}$ or ${K = \mathbb C}$, depending on taste.

Definition 1

A Lie group is a group ${G}$ which is also a ${K}$-manifold; the multiplication maps ${G \times G \rightarrow G}$ (by ${(g_1, g_2) \mapsto g_1g_2}$) and the inversion map ${G \rightarrow G}$ (by ${g \mapsto g^{-1}}$) are required to be smooth.

A morphism of Lie groups is a map which is both a map of manifolds and a group homomorphism.

Throughout, we will let ${e \in G}$ denote the identity, or ${e_G}$ if we need further emphasis.

Note that in particular, every group ${G}$ can be made into a Lie group by endowing it with the discrete topology. This is silly, so we usually require only focus on connected groups:

Proposition 2 (Reduction to connected Lie groups)

Let ${G}$ be a Lie group and ${G^0}$ the connected component of ${G}$ which contains ${e}$. Then ${G^0}$ is a normal subgroup, itself a Lie group, and the quotient ${G/G^0}$ has the discrete topology.

In fact, we can also reduce this to the study of simply connected Lie groups as follows.

Proposition 3 (Reduction to simply connected Lie groups)

If ${G}$ is connected, let ${\pi : \widetilde G \rightarrow G}$ be its universal cover. Then ${\widetilde G}$ is a Lie group, ${\pi}$ is a morphism of Lie groups, and ${\ker \pi \cong \pi_1(G)}$.

Here are some examples of Lie groups.

Example 4 (Examples of Lie groups)

• ${\mathbb R}$ under addition is a real one-dimensional Lie group.
• ${\mathbb C}$ under addition is a complex one-dimensional Lie group (and a two-dimensional real Lie group)!
• The unit circle ${S^1 \subseteq \mathbb C}$ is a real Lie group under multiplication.
• ${\text{GL }(n, K) \subset K^{\oplus n^2}}$ is a Lie group of dimension ${n^2}$. This example becomes important for representation theory: a representation of a Lie group ${G}$ is a morphism of Lie groups ${G \rightarrow \text{GL }(n, K)}$.
• ${\text{SL }(n, K) \subset \text{GL }(n, K)}$ is a Lie group of dimension ${n^2-1}$.

As geometric objects, Lie groups ${G}$ enjoy a huge amount of symmetry. For example, any neighborhood ${U}$ of ${e}$ can be “copied over” to any other point ${g \in G}$ by the natural map ${gU}$. There is another theorem worth noting, which is that:

Proposition 5

If ${G}$ is a connected Lie group and ${U}$ is a neighborhood of the identity ${e \in G}$, then ${U}$ generates ${G}$ as a group.

## 2. Haar measure

Recall the following result and its proof from representation theory:

Claim 6

For any finite group ${G}$, ${\mathbb C[G]}$ is semisimple; all finite-dimensional representations decompose into irreducibles.

Proof: Take a representation ${V}$ and equip it with an arbitrary inner form ${\left< -,-\right>_0}$. Then we can average it to obtain a new inner form

$\displaystyle \left< v, w \right> = \frac{1}{|G|} \sum_{g \in G} \left< gv, gw \right>_0.$

which is ${G}$-invariant. Thus given a subrepresentation ${W \subseteq V}$ we can just take its orthogonal complement to decompose ${V}$. $\Box$
We would like to repeat this type of proof with Lie groups. In this case the notion ${\sum_{g \in G}}$ doesn’t make sense, so we want to replace it with an integral ${\int_{g \in G}}$ instead. In order to do this we use the following:

Theorem 7 (Haar measure)

Let ${G}$ be a Lie group. Then there exists a unique Radon measure ${\mu}$ (up to scaling) on ${G}$ which is left-invariant, meaning

$\displaystyle \mu(g \cdot S) = \mu(S)$

for any Borel subset ${S \subseteq G}$ and “translate” ${g \in G}$. This measure is called the (left) Haar measure.

Example 8 (Examples of Haar measures)

• The Haar measure on ${(\mathbb R, +)}$ is the standard Lebesgue measure which assigns ${1}$ to the closed interval ${[0,1]}$. Of course for any ${S}$, ${\mu(a+S) = \mu(S)}$ for ${a \in \mathbb R}$.
• The Haar measure on ${(\mathbb R \setminus \{0\}, \times)}$ is given by

$\displaystyle \mu(S) = \int_S \frac{1}{|t|} \; dt.$

In particular, ${\mu([a,b]) = \log(b/a)}$. One sees the invariance under multiplication of these intervals.

• Let ${G = \text{GL }(n, \mathbb R)}$. Then a Haar measure is given by

$\displaystyle \mu(S) = \int_S |\det(X)|^{-n} \; dX.$

• For the circle group ${S^1}$, consider ${S \subseteq S^1}$. We can define

$\displaystyle \mu(S) = \frac{1}{2\pi} \int_S d\varphi$

across complex arguments ${\varphi}$. The normalization factor of ${2\pi}$ ensures ${\mu(S^1) = 1}$.

Note that we have:

Corollary 9

If the Lie group ${G}$ is compact, there is a unique Haar measure with ${\mu(G) = 1}$.

This follows by just noting that if ${\mu}$ is Radon measure on ${X}$, then ${\mu(X) < \infty}$. This now lets us deduce that

Corollary 10 (Compact Lie groups are semisimple)

${\mathbb C[G]}$ is semisimple for any compact Lie group ${G}$.

Indeed, we can now consider

$\displaystyle \left< v,w\right> = \int_G \left< g \cdot v, g \cdot w\right>_0 \; dg$

as we described at the beginning.

## 3. The tangent space at the identity

In light of the previous comment about neighborhoods of ${e}$ generating ${G}$, we see that to get some information about the entire Lie group it actually suffices to just get “local” information of ${G}$ at the point ${e}$ (this is one formalization of the fact that Lie groups are super symmetric).

To do this one idea is to look at the tangent space. Let ${G}$ be an ${n}$-dimensional Lie group (over ${K}$) and consider ${\mathfrak g = T_eG}$ the tangent space to ${G}$ at the identity ${e \in G}$. Naturally, this is a ${K}$-vector space of dimension ${n}$. We call it the Lie algebra associated to ${G}$.

Example 11 (Lie algebras corresponding to Lie groups)

• ${(\mathbb R, +)}$ has a real Lie algebra isomorphic to ${\mathbb R}$.
• ${(\mathbb C, +)}$ has a complex Lie algebra isomorphic to ${\mathbb C}$.
• The unit circle ${S^1 \subseteq \mathbb C}$ has a real Lie algebra isomorphic to ${\mathbb R}$, which we think of as the “tangent line” at the point ${1 \in S^1}$.

Example 12 (${\mathfrak{gl}(n, K)}$)

Let’s consider ${\text{GL }(n, K) \subset K^{\oplus n^2}}$, an open subset of ${K^{\oplus n^2}}$. Its tangent space should just be an ${n^2}$-dimensional ${K}$-vector space. By identifying the components in the obvious way, we can think of this Lie algebra as just the set of all ${n \times n}$ matrices.

This Lie algebra goes by the notation ${\mathfrak{gl}(n, K)}$.

Example 13 (${\mathfrak{sl}(n, K)}$)

Recall ${\text{SL }(n, K) \subset \text{GL }(n, K)}$ is a Lie group of dimension ${n^2-1}$, hence its Lie algebra should have dimension ${n^2-1}$. To see what it is, let’s look at the special case ${n=2}$ first: then

$\displaystyle \text{SL }(2, K) = \left\{ \begin{pmatrix} a & b \\ c & d \end{pmatrix} \mid ad - bc = 1 \right\}.$

Viewing this as a polynomial surface ${f(a,b,c,d) = ad-bc}$ in ${K^{\oplus 4}}$, we compute

$\displaystyle \nabla f = \left< d, -c, -b, a \right>$

and in particular the tangent space to the identity matrix ${\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}}$ is given by the orthogonal complement of the gradient

$\displaystyle \nabla f (1,0,0,1) = \left< 1, 0, 0, 1 \right>.$

Hence the tangent plane can be identified with matrices satisfying ${a+d=0}$. In other words, we see

$\displaystyle \mathfrak{sl}(2, K) = \left\{ T \in \mathfrak{gl}(2, K) \mid \text{Tr } T = 0. \right\}.$

By repeating this example in greater generality, we discover

$\displaystyle \mathfrak{sl}(n, K) = \left\{ T \in \mathfrak{gl}(n, K) \mid \text{Tr } T = 0. \right\}.$

## 4. The exponential map

Right now, ${\mathfrak g}$ is just a vector space. However, by using the group structure we can get a map from ${\mathfrak g}$ back into ${G}$. The trick is “differential equations”:

Proposition 14 (Differential equations for Lie theorists)

Let ${G}$ be a Lie group over ${K}$ and ${\mathfrak g}$ its Lie algebra. Then for every ${x \in \mathfrak g}$ there is a unique homomorphism

$\displaystyle \gamma_x : K \rightarrow G$

which is a morphism of Lie groups, such that

$\displaystyle \gamma_x'(0) = x \in T_eG = \mathfrak g.$

We will write ${\gamma_x(t)}$ to emphasize the argument ${t \in K}$ being thought of as “time”. Thus this proposition should be intuitively clear: the theory of differential equations guarantees that ${\gamma_x}$ is defined and unique in a small neighborhood of ${0 \in K}$. Then, the group structure allows us to extend ${\gamma_x}$ uniquely to the rest of ${K}$, giving a trajectory across all of ${G}$. This is sometimes called a one-parameter subgroup of ${G}$, but we won’t use this terminology anywhere in what follows.

This lets us define:

Definition 15

Retain the setting of the previous proposition. Then the exponential map is defined by

$\displaystyle \exp : \mathfrak g \rightarrow G \qquad\text{by}\qquad x \mapsto \gamma_x(1).$

The exponential map gets its name from the fact that for all the examples I discussed before, it is actually just the map ${e^\bullet}$. Note that below, ${e^T = \sum_{k \ge 0} \frac{T^k}{k!}}$ for a matrix ${T}$; this is called the matrix exponential.

Example 16 (Exponential Maps of Lie algebras)

• If ${G = \mathbb R}$, then ${\mathfrak g = \mathbb R}$ too. We observe ${\gamma_x(t) = e^{tx} \in \mathbb R}$ (where ${t \in \mathbb R}$) is a morphism of Lie groups ${\gamma_x : \mathbb R \rightarrow G}$. Hence

$\displaystyle \exp : \mathbb R \rightarrow \underbrace{\mathbb R}_{=G} \qquad \exp(x) = \gamma_x(1) = e^t \in \mathbb R = G.$

• Ditto for ${\mathbb C}$.
• For ${S^1}$ and ${x \in \mathbb R}$, the map ${\gamma_x : \mathbb R \rightarrow S^1}$ given by ${t \mapsto e^{itx}}$ works. Hence

$\displaystyle \exp : \mathbb R \rightarrow S^1 \qquad \exp(x) = \gamma_x(1) = e^{it} \in S^1.$

• For ${\text{GL }(n, K)}$, the map ${\gamma_X : K \rightarrow \text{GL }(n, K)}$ given by ${t \mapsto e^{tX}}$ works nicely (now ${X}$ is a matrix). (Note that we have to check ${e^{tX}}$ is actually invertible for this map to be well-defined.) Hence the exponential map is given by

$\displaystyle \exp : \mathfrak{gl}(n,K) \rightarrow \text{GL }(n,K) \qquad \exp(X) = \gamma_X(1) = e^X \in \text{GL }(n, K).$

• Similarly,

$\displaystyle \exp : \mathfrak{sl}(n,K) \rightarrow \text{SL }(n,K) \qquad \exp(X) = \gamma_X(1) = e^X \in \text{SL }(n, K).$

Here we had to check that if ${X \in \mathfrak{sl}(n,K)}$, meaning ${\text{Tr } X = 0}$, then ${\det(e^X) = 1}$. This can be seen by writing ${X}$ in an upper triangular basis.

Actually, taking the tangent space at the identity is a functor. Consider a map ${\varphi : G_1 \rightarrow G_2}$ of Lie groups, with lie algebras ${\mathfrak g_1}$ and ${\mathfrak g_2}$. Because ${\varphi}$ is a group homomorphism, ${G_1 \ni e_1 \mapsto e_2 \in G_2}$. Now, by manifold theory we know that maps ${f : M \rightarrow N}$ between manifolds gives a linear map between the corresponding tangent spaces, say ${Tf : T_pM \rightarrow T_{fp}N}$. For us we obtain a linear map

$\displaystyle \varphi_\ast = T \varphi : \mathfrak g_1 \rightarrow \mathfrak g_2.$

In fact, this ${\varphi_\ast}$ fits into a diagram

Here are a few more properties of ${\exp}$:

• ${\exp(0) = e \in G}$, which is immediate by looking at the constant trajectory ${\phi_0(t) \equiv e}$.
• ${\exp'(x) = x \in \mathfrak g}$, i.e. the total derivative ${D\exp : \mathfrak g \rightarrow \mathfrak g}$ is the identity. This is again by construction.
• In particular, by the inverse function theorem this implies that ${\exp}$ is a diffeomorphism in a neighborhood of ${0 \in \mathfrak g}$, onto a neighborhood of ${e \in G}$.
• ${\exp}$ commutes with the commutator. (By the above diagram.)

## 5. The commutator

Right now ${\mathfrak g}$ is still just a vector space, the tangent space. But now that there is map ${\exp : \mathfrak g \rightarrow G}$, we can use it to put a new operation on ${\mathfrak g}$, the so-called commutator.

The idea is follows: we want to “multiply” two elements of ${\mathfrak g}$. But ${\mathfrak g}$ is just a vector space, so we can’t do that. However, ${G}$ itself has a group multiplication, so we should pass to ${G}$ using ${\exp}$, use the multiplication in ${G}$ and then come back.

Here are the details. As we just mentioned, ${\exp}$ is a diffeomorphism near ${e \in G}$. So for ${x}$, ${y}$ close to the origin of ${\mathfrak g}$, we can look at ${\exp(x)}$ and ${\exp(y)}$, which are two elements of ${G}$ close to ${e}$. Multiplying them gives an element still close to ${e}$, so its equal to ${\exp(z)}$ for some unique ${z}$, call it ${\mu(x,y)}$.

One can show in fact that ${\mu}$ can be written as a Taylor series in two variables as

$\displaystyle \mu(x,y) = x + y + \frac{1}{2} [x,y] + \text{third order terms} + \dots$

where ${[x,y]}$ is a skew-symmetric bilinear map, meaning ${[x,y] = -[y,x]}$. It will be more convenient to work with ${[x,y]}$ than ${\mu(x,y)}$ itself, so we give it a name:

Definition 17

This ${[x,y]}$ is called the commutator of ${G}$.

Now we know multiplication in ${G}$ is associative, so this should give us some nontrivial relation on the bracket ${[,]}$. Specifically, since

$\displaystyle \exp(x) \left( \exp(y) \exp(z) \right) = \left( \exp(x) \exp(y) \right) \exp(z).$

we should have that ${\mu(x, \mu(y,z)) = \mu(\mu(x,y), z)}$, and this should tell us something. In fact, the claim is:

Theorem 18

The bracket ${[,]}$ satisfies the Jacobi identity

$\displaystyle [x,[y,z]] + [y,[z,x]] + [z,[x,y]] = 0.$

Proof: Although I won’t prove it, the third-order terms (and all the rest) in our definition of ${[x,y]}$ can be written out explicitly as well: for example, for example, we actually have

$\displaystyle \mu(x,y) = x + y + \frac{1}{2} [x,y] + \frac{1}{12} \left( [x, [x,y]] + [y,[y,x]] \right) + \text{fourth order terms} + \dots.$

The general formula is called the Baker-Campbell-Hausdorff formula.

Then we can force ourselves to expand this using the first three terms of the BCS formula and then equate the degree three terms. The left-hand side expands initially as ${\mu\left( x, y + z + \frac{1}{2} [y,z] + \frac{1}{12} \left( [y,[y,z]] + [z,[z,y] \right) \right)}$, and the next step would be something ugly.

This computation is horrifying and painful, so I’ll pretend I did it and tell you the end result is as claimed. $\Box$
There is a more natural way to see why this identity is the “right one”; see Qiaochu. However, with this proof I want to make the point that this Jacobi identity is not our decision: instead, the Jacobi identity is forced upon us by associativity in ${G}$.

Example 19 (Examples of commutators attached to Lie groups)

• If ${G}$ is an abelian group, we have ${-[y,x] = [x,y]}$ by symmetry and ${[x,y] = [y,x]}$ from ${\mu(x,y) = \mu(y,x)}$. Thus ${[x,y] = 0}$ in ${\mathfrak g}$ for any abelian Lie group ${G}$.
• In particular, the brackets for ${G \in \{\mathbb R, \mathbb C, S^1\}}$ are trivial.
• Let ${G = \text{GL }(n, K)}$. Then one can show that

$\displaystyle [T,S] = TS - ST \qquad \forall S, T \in \mathfrak{gl}(n, K).$

• Ditto for ${\text{SL }(n, K)}$.

In any case, with the Jacobi identity we can define an general Lie algebra as an intrinsic object with a Jacobi-satisfying bracket:

Definition 20

A Lie algebra over ${k}$ is a ${k}$-vector space equipped with a skew-symmetric bilinear bracket ${[,]}$ satisfying the Jacobi identity.

A morphism of Lie algebras and preserves the bracket.

Note that a Lie algebra may even be infinite-dimensional (even though we are assuming ${G}$ is finite-dimensional, so that they will never come up as a tangent space).

Example 21 (Associative algebra ${\rightarrow}$ Lie algebra)

Any associative algebra ${A}$ over ${k}$ can be made into a Lie algebra by taking the same underlying vector space, and using the bracket ${[a,b] = ab - ba}$.

## 6. The fundamental theorems

We finish this list of facts by stating the three “fundamental theorems” of Lie theory. They are based upon the functor

$\displaystyle \mathscr{L} : G \mapsto T_e G$

we have described earlier, which is a functor

• from the category of Lie groups
• into the category of finite-dimensional Lie algebras.

The first theorem requires the following definition:

Definition 22

A Lie subgroup ${H}$ of a Lie group ${G}$ is a subgroup ${H}$ such that the inclusion map ${H \hookrightarrow G}$ is also an injective immersion.

A Lie subalgebra ${\mathfrak h}$ of a Lie algebra ${\mathfrak g}$ is a vector subspace preserved under the bracket (meaning that ${[\mathfrak h, \mathfrak h] \subseteq \mathfrak h]}$).

Theorem 23 (Lie I)

Let ${G}$ be a real or complex Lie group with Lie algebra ${\mathfrak g}$. Then given a Lie subgroup ${H \subseteq G}$, the map

$\displaystyle H \mapsto \mathscr{L}(H) \subseteq \mathfrak g$

is a bijection between Lie subgroups of ${G}$ and Lie subalgebras of ${\mathfrak g}$.

Theorem 24 (The Lie functor is an equivalence of categories)

Restrict ${\mathscr{L}}$ to a functor

• from the category of simply connected Lie groups over ${K}$
• to the category of finite-dimensional Lie algebras over ${K}$.

Then

1. (Lie II) ${\mathscr{L}}$ is fully faithful, and
2. (Lie III) ${\mathscr{L}}$ is essentially surjective on objects.

If we drop the “simply connected” condition, we obtain a functor which is faithful and exact, but not full: non-isomorphic Lie groups can have isomorphic Lie algebras (one example is ${\text{SO }(3)}$ and ${\text{SU }(2)}$).

# Combinatorial Nullstellensatz and List Coloring

More than six months late, but here are notes from the combinatorial nullsetllensatz talk I gave at the student colloquium at MIT. This was also my term paper for 18.434, “Seminar in Theoretical Computer Science”.

## 1. Introducing the choice number

One of the most fundamental problems in graph theory is that of a graph coloring, in which one assigns a color to every vertex of a graph so that no two adjacent vertices have the same color. The most basic invariant related to the graph coloring is the chromatic number:

Definition 1

A simple graph ${G}$ is ${k}$-colorable if it’s possible to properly color its vertices with ${k}$ colors. The smallest such ${k}$ is the chromatic number ${\chi(G)}$.

In this exposition we study a more general notion in which the set of permitted colors is different for each vertex, as long as at least ${k}$ colors are listed at each vertex. This leads to the notion of a so-called choice number, which was introduced by Erdös, Rubin, and Taylor.

Definition 2

A simple graph ${G}$ is ${k}$-choosable if its possible to properly color its vertices given a list of ${k}$ colors at each vertex. The smallest such ${k}$ is the choice number ${\mathop{\mathrm{ch}}(G)}$.

Example 3

We have ${\mathop{\mathrm{ch}}(C_{2n}) = \chi(C_{2n}) = 2}$ for any integer ${n}$ (here ${C_{2n}}$ is the cycle graph on ${2n}$ vertices). To see this, we only have to show that given a list of two colors at each vertex of ${C_{2n}}$, we can select one of them.

• If the list of colors is the same at each vertex, then since ${C_{2n}}$ is bipartite, we are done.
• Otherwise, suppose adjacent vertices ${v_1}$, ${v_{2n}}$ are such that some color at ${c}$ is not in the list at ${v_{2n}}$. Select ${c}$ at ${v_1}$, and then greedily color in ${v_2}$, \dots, ${v_{2n}}$ in that order.

We are thus naturally interested in how the choice number and the chromatic number are related. Of course we always have

$\displaystyle \mathop{\mathrm{ch}}(G) \ge \chi(G).$

Näively one might expect that we in fact have an equality, since allowing the colors at vertices to be different seems like it should make the graph easier to color. However, the following example shows that this is not the case.

Example 4 (Erdös)

Let ${n \ge 1}$ be an integer and define

$\displaystyle G = K_{n^n, n}.$

We claim that for any integer ${n \ge 1}$ we have

$\displaystyle \mathop{\mathrm{ch}}(G) \ge n+1 \quad\text{and}\quad \chi(G) = 2.$

The latter equality follows from ${G}$ being partite.

Now to see the first inequality, let ${G}$ have vertex set ${U \cup V}$, where ${U}$ is the set of functions ${u : [n] \rightarrow [n]}$ and ${V = [n]}$. Then consider ${n^2}$ colors ${C_{i,j}}$ for ${1 \le i, j \le n}$. On a vertex ${u \in U}$, we list colors ${C_{1,u(1)}}$, ${C_{2,u(2)}}$, \dots, ${C_{n,u(n)}}$. On a vertex ${v \in V}$, we list colors ${C_{v,1}}$, ${C_{v,2}}$, \dots, ${C_{v,n}}$. By construction it is impossible to properly color ${G}$ with these colors.

The case ${n = 3}$ is illustrated in the figure below (image in public domain).

This surprising behavior is the subject of much research: how can we bound the choice number of a graph as a function of its chromatic number and other properties of the graph? We see that the above example requires exponentially many vertices in ${n}$.

Theorem 5 (Noel, West, Wu, Zhu)

If ${G}$ is a graph with ${n}$ vertices then

$\displaystyle \chi(G) \le \mathop{\mathrm{ch}}(G) \le \max\left( \chi(G), \left\lceil \frac{\chi(G)+n-1}{3} \right\rceil \right).$

In particular, if ${n \le 2\chi(G)+1}$ then ${\mathop{\mathrm{ch}}(G) = \chi(G)}$.

One of the most major open problems in this direction is the following.

Definition 6

A claw-free graph is a graph with no induced ${K_{3,1}}$. For example, the line graph (also called edge graph) of any simple graph ${G}$ is claw-free.

If ${G}$ is a claw-free graph, then ${\mathop{\mathrm{ch}}(G) = \chi(G)}$. In particular, this conjecture implies that for edge coloring, the notions of “chromatic number” and “choice number” coincide.

In this exposition, we prove the following result of Alon.

Theorem 7 (Alon)

A bipartite graph ${G}$ is ${\left\lfloor L(G) \right\rfloor+1}$ choosable, where

$\displaystyle L(G) \overset{\mathrm{def}}{=} \max_{H \subseteq G} |E(H)|/|V(H)|$

is half the maximum of the average degree of subgraphs ${H}$.

In particular, recall that a planar bipartite graph ${H}$ with ${r}$ vertices contains at most ${2r-4}$ edges. Thus for such graphs we have ${L(G) \le 2}$ and deduce:

Corollary 8

A planar bipartite graph is ${3}$-choosable.

This corollary is sharp, as it applies to ${K_{2,4}}$ which we have seen in Example 4 has ${\mathop{\mathrm{ch}}(K_{2,4}) = 3}$.

The rest of the paper is divided as follows. First, we begin in §2 by stating Theorem 9, the famous combinatorial nullstellensatz of Alon. Then in §3 and §4, we provide descriptions of the so-called graph polynomial, to which we then apply combinatorial nullstellensatz to deduce Theorem 18. Finally in §5, we show how to use Theorem 18 to prove Theorem 7.

## 2. Combinatorial Nullstellensatz

The main tool we use is the Combinatorial Nullestellensatz of Alon.

Theorem 9 (Combinatorial Nullstellensatz)

Let ${F}$ be a field, and let ${f \in F[x_1, \dots, x_n]}$ be a polynomial of degree ${t_1 + \dots + t_n}$. Let ${S_1, S_2, \dots, S_n \subseteq F}$ such that ${\left\lvert S_i \right\rvert > t_i}$ for all ${i}$.

Assume the coefficient of ${x_1^{t_1}x_2^{t_2}\dots x_n^{t_n}}$ of ${f}$ is not zero. Then we can pick ${s_1 \in S_1}$, \dots, ${s_n \in S_n}$ such that

$\displaystyle f(s_1, s_2, \dots, s_n) \neq 0.$

Example 10

Let us give a second proof that

$\displaystyle \mathop{\mathrm{ch}}(C_{2n}) = 2$

for every positive integer ${n}$. Our proof will be an application of the Nullstellensatz.

Regard the colors as real numbers, and let ${S_i}$ be the set of colors at vertex ${i}$ (hence ${1 \le i \le 2n}$, and ${|S_i| = 2}$). Consider the polynomial

$\displaystyle f = \left( x_1-x_2 \right)\left( x_2-x_3 \right) \dots \left( x_{2n-1}-x_{2n} \right)\left( x_{2n}-x_1 \right)$

The coefficient of ${x_1^1 x_2^1 \dots x_{2n}^1}$ is ${2 \neq 0}$. Therefore, one can select a color from each ${S_i}$ so that ${f}$ does not vanish.

## 3. The Graph Polynomial, and Directed Orientations

Motivated by Example 10, we wish to apply a similar technique to general graphs ${G}$. So in what follows, let ${G}$ be a (simple) graph with vertex set ${\{1, \dots, n\}}$.

Definition 11

The graph polynomial of ${G}$ is defined by

$\displaystyle f_G(x_1, \dots, x_n) = \prod_{\substack{(i,j) \in E(G) \\ i < j}} (x_i-x_j).$

We observe that coefficients of ${f_G}$ correspond to differences in directed orientations. To be precise, we introduce the notation:

Definition 12

Consider orientations on the graph ${G}$ with vertex set ${\{1, \dots, n\}}$, meaning we assign a direction ${v \rightarrow w}$ to every edge of ${G}$ to make it into a directed graph ${G}$. An oriented edge is called ascending if ${v \rightarrow w}$ and ${v \le w}$, i.e. the edge points from the smaller number to the larger one.

Then we say that an orientation is

• even if there are an even number of ascending edges, and
• odd if there are an odd number of ascending edges.

Finally, we define

• ${\mathop{\mathrm{DE}}_G(d_1, \dots, d_n)}$ to the be set of all even orientations of ${G}$ in which vertex ${i}$ has indegree ${d_i}$.
• ${\mathop{\mathrm{DO}}_G(d_1, \dots, d_n)}$ to the be set of all odd orientations of ${G}$ in which vertex ${i}$ has indegree ${d_i}$.

Set ${\mathop{\mathrm{D}}_G(d_1,\dots,d_n) = \mathop{\mathrm{DE}}_G(d_1,\dots,d_n) \cup \mathop{\mathrm{DO}}_G(d_1,\dots,d_n)}$.

Example 13

Consider the following orientation:

There are exactly two ascending edges, namely ${1 \rightarrow 2}$ and ${2 \rightarrow 4}$. The indegrees of are ${d_1 = 0}$, ${d_2 = 2}$ and ${d_3 = d_4 = 1}$. Therefore, this particular orientation is an element of ${\mathop{\mathrm{DE}}_G(0,2,1,1)}$. In terms of ${f_G}$, this corresponds to the choice of terms

$\displaystyle \left( x_1- \boldsymbol{x_2} \right) \left( \boldsymbol{x_2}-x_3 \right) \left( x_2-\boldsymbol{x_4} \right) \left( \boldsymbol{x_3}-x_4 \right)$

which is a ${+ x_2^2 x_3 x_4}$ term.

Lemma 14

In the graph polynomial of ${G}$, the coefficient of ${x_1^{d_1} \dots x_n^{d_n}}$ is

$\displaystyle \left\lvert \mathop{\mathrm{DE}}_G(d_1, \dots, d_n) \right\rvert - \left\lvert \mathop{\mathrm{DO}}_G(d_1, \dots, d_n) \right\rvert.$

Proof: Consider expanding ${f_G}$. Then each expanded term corresponds to a choice of ${x_i}$ or ${x_j}$ from each ${(i,j)}$, as in Example 13. The term has coefficient ${+1}$ is the orientation is even, and ${-1}$ if the orientation is odd, as desired. $\Box$

Thus we have an explicit combinatorial description of the coefficients in the graph polynomial ${f_G}$.

## 4. Coefficients via Eulerian Suborientations

We now give a second description of the coefficients of ${f_G}$.

Definition 15

Let ${D \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)}$, viewed as a directed graph. An Eulerian suborientation of ${D}$ is a subgraph of ${D}$ (not necessarily induced) in which every vertex has equal indegree and outdegree. We say that such a suborientation is

• even if it has an even number of edges, and
• odd if it has an odd number of edges.

Note that the empty suborientation is allowed. We denote the even and odd Eulerian suborientations of ${D}$ by ${\mathop{\mathrm{EE}}(D)}$ and ${\mathop{\mathrm{EO}}(D)}$, respectively.

Eulerian suborientations are brought into the picture by the following lemma.

Lemma 16

Assume ${D \in \mathop{\mathrm{DE}}_G(d_1, \dots, d_n)}$. Then there are natural bijections

\displaystyle \begin{aligned} \mathop{\mathrm{DE}}_G(d_1, \dots, d_n) &\rightarrow \mathop{\mathrm{EE}}(D) \\ \mathop{\mathrm{DO}}_G(d_1, \dots, d_n) &\rightarrow \mathop{\mathrm{EO}}(D). \end{aligned}

Similarly, if ${D \in \mathop{\mathrm{DO}}_G(d_1, \dots, d_n)}$ then there are bijections

\displaystyle \begin{aligned} \mathop{\mathrm{DE}}_G(d_1, \dots, d_n) &\rightarrow \mathop{\mathrm{EO}}(D) \\ \mathop{\mathrm{DO}}_G(d_1, \dots, d_n) &\rightarrow \mathop{\mathrm{EE}}(D). \end{aligned}

Proof: Consider any orientation ${D' \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)}$, Then we define a suborietation of ${D}$, denoted ${D \rtimes D'}$, by including exactly the edges of ${D}$ whose orientation in ${D'}$ is in the opposite direction. It’s easy to see that this induces a bijection

$\displaystyle D \rtimes - : \mathop{\mathrm{D}}_G(d_1, \dots, d_n) \rightarrow \mathop{\mathrm{EE}}(D) \cup \mathop{\mathrm{EO}}(D)$

Moreover, remark that

• ${D \rtimes D'}$ is even if ${D}$ and ${D'}$ are either both even or both odd, and
• ${D \rtimes D'}$ is odd otherwise.

The lemma follows from this. $\Box$

Corollary 17

In the graph polynomial of ${G}$, the coefficient of ${x_1^{d_1} \dots x_n^{d_n}}$ is

$\displaystyle \pm \left( \left\lvert \mathop{\mathrm{EE}}(D) \right\rvert - \left\lvert \mathop{\mathrm{EO}}(D) \right\rvert \right)$

where ${D \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)}$ is arbitrary.

Proof: Combine Lemma 14 and Lemma 16. $\Box$

We now arrive at the main result:

Theorem 18

Let ${G}$ be a graph on ${\{1, \dots, n\}}$, and let ${D \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)}$ be an orientation of ${G}$. If ${\left\lvert \mathop{\mathrm{EE}}(D) \right\rvert \neq \left\lvert \mathop{\mathrm{EO}}(D) \right\rvert}$, then given a list of ${d_i+1}$ colors at each vertex of ${G}$, there exists a proper coloring of the vertices of ${G}$.

In particular, ${G}$ is ${(1+\max_i d_i)}$-choosable.

Proof: Combine Corollary 17 with Theorem 9. $\Box$

## 5. Finding an orientation

Armed with Theorem 18, we are almost ready to prove Theorem 7. The last ingredient is that we need to find an orientation on ${G}$ in which the maximal degree is not too large. This is accomplished by the following.

Lemma 19

Let ${L(G) \overset{\mathrm{def}}{=} \max_{H \subseteq G} |E(H)|/|V(H)|}$ as in Theorem 7. Then ${G}$ has an orientation in which every indegree is at most ${\left\lceil L(G) \right\rceil}$.

Proof: This is an application of Hall’s marriage theorem.

Let ${d = \left\lceil L(G) \right\rceil \ge L(G)}$. Construct a bipartite graph

$\displaystyle E \cup X \qquad \text{where}\qquad E = E(G) \quad\text{ and }\quad X = \underbrace{V(G) \sqcup \dots \sqcup V(G)}_{d \text{ times}}.$

Connect ${e \in E}$ and ${v \in X}$ if ${v}$ is an endpoint of ${e}$. Since ${d \ge L(G)}$ we satisfy Hall’s condition (as ${L(G)}$ is a condition for all subgraphs ${H \subseteq G}$) and can match each edge in ${E}$ to a (copy of some) vertex in ${X}$. Since there are exactly ${d}$ copies of each vertex in ${X}$, the conclusion follows. $\Box$

Now we can prove Theorem 7. Proof: According to Lemma 19, pick ${D \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)}$ where ${\max d_i \le \left\lceil L(G) \right\rceil}$. Since ${G}$ is bipartite, we obviously have ${\mathop{\mathrm{EO}}(D) = \varnothing}$, since ${G}$ cannot have any odd cycles. So Theorem 18 applies and we are done. $\Box$

# SysRq on Arch Linux Mac Mini

This post documents my adventures of getting the SysRQ key working on my Mac Mini and Macbook (both running Arch Linux). The suggestions of loadkeys and keyfuzz that are the first search entries don’t work for me, so some more sophisticated black magic was necessary.

## Remapping the Fn keys

This step is technically optional, but I did it because the function keys are a pain anyways. Normally on Apple keyboards one needs to use the Fn key to get the normal Fn keys to behave as a F<n> keystroke. I prefer to reverse this behavior, so that the SysRq combinations is Alt+F13+F rather than Fn+Alt+F13+F, say.

For this, the advice on the Arch Wiki worked, although it is not thorough on some points that I think should’ve been said. On newer kernels, one does this by creating the file /etc/modprobe.d/hid_apple.conf and writing

options hid_apple fnmode=2


Then I edited the file /etc/mkinitcpio.conf to include the new file:

...
BINARIES=""

# FILES
# This setting is similar to BINARIES above, however, files are added
# as-is and are not parsed in any way.  This is useful for config files.
FILES="/etc/modprobe.d/hid_apple.conf"

# HOOKS
...


Finally, recompile the kernel for this change to take effect. On Arch Linux one can just do this by issuing the command

$sudo evtest No device specified, trying to scan all of /dev/input/event* Available devices: /dev/input/event0: Logitech USB Receiver /dev/input/event1: Logitech USB Receiver /dev/input/event2: Apple, Inc Apple Keyboard /dev/input/event3: Apple, Inc Apple Keyboard /dev/input/event4: Apple Computer, Inc. IR Receiver /dev/input/event5: HDA NVidia Headphone /dev/input/event6: HDA NVidia HDMI/DP,pcm=3 /dev/input/event7: Power Button /dev/input/event8: Sleep Button /dev/input/event9: Power Button /dev/input/event10: Video Bus /dev/input/event11: PC Speaker /dev/input/event12: HDA NVidia HDMI/DP,pcm=7 /dev/input/event13: HDA NVidia HDMI/DP,pcm=8 Select the device event number [0-13]: 2 Input driver version is 1.0.1 Input device ID: bus 0x3 vendor 0x5ac product 0x220 version 0x111 Input device name: "Apple, Inc Apple Keyboard"  This is on my Mac Mini; the list of devices looks different on my laptop. After this pressing the desired key yields something which looked like Event: time 1456870457.844237, -------------- SYN_REPORT ------------ Event: time 1456870457.924097, type 4 (EV_MSC), code 4 (MSC_SCAN), value 70068 Event: time 1456870457.924097, type 1 (EV_KEY), code 183 (KEY_F13), value 1  This is the F13 key which I want to map into a SysRq — the keycode 70068 above (which is in fact a hex code) is the one I wanted. ## Using udev Now that I had the scancode, I cd’ed to /etc/udev/hwdb.d and added a file 90-keyboard-sysrq.hwdb with the content evdev:input:b0003* KEYBOARDKEY_70068=sysrq  One then updates hwdb.bin by running the command $ sudo udevadm hwdb --update
$sudo udevadm trigger  The latter command makes the changes take effect immediately. You should be able to test this by running sudo evtest again; evtest should now report the new keycode (but the same scancode). One can test the SysRQ key by running Alt+SysRq+H, and then checking the dmesg output to see if anything happened: $ dmesg | tail -n 1
[  283.001240] sysrq: SysRq : HELP : loglevel(0-9) reboot(b) crash(c) ...


## Enable SysRq

It remains to actually enable SysRQ, according to the bitmask described here. My system default was apparently 16:

\$ sysctl kernel.sysrq
kernel.sysrq = 16


For my purposes, I then edited /etc/sysctl.d/99-sysctl.conf and added the line

kernel.sysrq=254


This gave me everything except the nicing of real-time tasks. Of course the choice of value here is just personal preference.

Personally, my main use for this is killing Chromium, which has a bad habit of freezing up my computer (especially if Firefox is open too). I remedy the situation by repeatedly running Alt+SysRq+F to kill off the memory hogs. If this doesn’t work, just Alt+SysRq+K kills off all the processes in the current TTY.

# Algebraic Topology Functors

This will be old news to anyone who does algebraic topology, but oddly enough I can’t seem to find it all written in one place anywhere, and in particular I can’t find the bit about ${\mathsf{hPairTop}}$ at all.

In algebraic topology you (for example) associate every topological space ${X}$ with a group, like ${\pi_1(X, x_0)}$ or ${H_5(X)}$. All of these operations turn out to be functors. This isn’t surprising, because as far as I’m concerned the definition of a functor is “any time you take one type of object and naturally make another object”.

The surprise is that these objects also respect homotopy in a nice way; proving this is a fair amount of the “setup” work in algebraic topology.

## 1. Homology, ${H_n : \mathsf{hTop} \rightarrow \mathsf{Grp}}$

Note that ${H_5}$ is a functor

$\displaystyle H_5 : \mathsf{Top} \rightarrow \mathsf{Grp}$

i.e. to every space ${X}$ we can associate a group ${H_5(X)}$. (Of course, replace ${5}$ by integer of your choice.) Recall that:

Definition 1

Two maps ${f, g : X \rightarrow Y}$ are homotopy equivalent if there exists a homotopy between them.

Thus for a map we can take its homotopy class ${[f]}$ (the equivalence class under this relationship). This has the nice property that ${[f \circ g] = [f] \circ [g]}$ and so on.

Definition 2

Two spaces ${X}$ and ${Y}$ are homotopic if there exists a pair of maps ${f : X \rightarrow Y}$ and ${g : Y \rightarrow X}$ such that ${[f \circ g] = [\mathrm{id}_X]}$ and ${[g \circ f] = [\mathrm{id}_Y]}$.

In light of this, we can define

Definition 3

The category ${\mathsf{hTop}}$ is defined as follows:

• The objects are topological spaces ${X}$.
• The morphisms ${X \rightarrow Y}$ are homotopy classes of continuous maps ${X \rightarrow Y}$.

Remark 4

Composition is well-defined since ${[f \circ g] = [f] \circ [g]}$. Two spaces are isomorphic in ${\mathsf{hTop}}$ if they are homotopic.

Remark 5

As you might guess this “quotient” construction is called a quotient category.

Then the big result is that:

Theorem 6

The induced map ${f_\sharp = H_n(f)}$ of a map ${f: X \rightarrow Y}$ depends only on the homotopy class of ${f}$. Thus ${H_n}$ is a functor

$\displaystyle H_n : \mathsf{hTop} \rightarrow \mathsf{Grp}.$

The proof of this is geometric, using the so-called prism operators. In any case, as with all functors we deduce

Corollary 7

${H_n(X) \cong H_n(Y)}$ if ${X}$ and ${Y}$ are homotopic.

In particular, the contractible spaces are those spaces ${X}$ which are homotopy equivalent to a point. In which case, ${H_n(X) = 0}$ for all ${n \ge 1}$.

## 2. Relative Homology, ${H_n : \mathsf{hPairTop} \rightarrow \mathsf{Grp}}$

In fact, we also defined homology groups

$\displaystyle H_n(X,A)$

for ${A \subseteq X}$. We will now show this is functorial too.

Definition 8

Let ${\varnothing \neq A \subset X}$ and ${\varnothing \neq B \subset X}$ be subspaces, and consider a map ${f : X \rightarrow Y}$. If ${f(A) \subseteq B}$ we write

$\displaystyle f : (X,A) \rightarrow (Y,B).$

We say ${f}$ is a map of pairs, between the pairs ${(X,A)}$ and ${(Y,B)}$.

Definition 9

We say that ${f,g : (X,A) \rightarrow (Y,B)}$ are pair-homotopic if they are “homotopic through maps of pairs”.

More formally, a pair-homotopy ${f, g : (X,A) \rightarrow (Y,B)}$ is a map ${F : [0,1] \times X \rightarrow Y}$, which we’ll write as ${F_t(X)}$, such that ${F}$ is a homotopy of the maps ${f,g : X \rightarrow Y}$ and each ${F_t}$ is itself a map of pairs.

Thus, we naturally arrive at two categories:

• ${\mathsf{PairTop}}$, the category of pairs of toplogical spaces, and
• ${\mathsf{hPairTop}}$, the same category except with maps only equivalent up to homotopy.

Definition 10

As before, we say pairs ${(X,A)}$ and ${(Y,B)}$ are pair-homotopy equivalent if they are isomorphic in ${\mathsf{hPairTop}}$. An isomorphism of ${\mathsf{hPairTop}}$ is a pair-homotopy equivalence.

Then, the prism operators now let us derive

Theorem 11

We have a functor

$\displaystyle H_n : \mathsf{hPairTop} \rightarrow \mathsf{Grp}.$

The usual corollaries apply.

Now, we want an analog of contractible spaces for our pairs: i.e. pairs of spaces ${(X,A)}$ such that ${H_n(X,A) = 0}$ for ${n \ge 1}$. The correct definition is:

Definition 12

Let ${A \subset X}$. We say that ${A}$ is a deformation retract of ${X}$ if there is a map of pairs ${r : (X, A) \rightarrow (A, A)}$ which is a pair homotopy equivalence.

Example 13 (Examples of Deformation Retracts)

1. If a single point ${p}$ is a deformation retract of a space ${X}$, then ${X}$ is contractible, since the retraction ${r : X \rightarrow \{\ast\}}$ (when viewed as a map ${X \rightarrow X}$) is homotopic to the identity map ${\mathrm{id}_X : X \rightarrow X}$.
2. The punctured disk ${D^2 \setminus \{0\}}$ deformation retracts onto its boundary ${S^1}$.
3. More generally, ${D^{n} \setminus \{0\}}$ deformation retracts onto its boundary ${S^{n-1}}$.
4. Similarly, ${\mathbb R^n \setminus \{0\}}$ deformation retracts onto a sphere ${S^{n-1}}$.

Of course in this situation we have that

$\displaystyle H_n(X,A) \cong H_n(A,A) = 0.$

## 3. Homotopy, ${\pi_1 : \mathsf{hTop}_\ast \rightarrow \mathsf{Grp}}$

As a special case of the above, we define

Definition 14

The category ${\mathsf{Top}_\ast}$ is defined as follows:

• The objects are pairs ${(X, x_0)}$ of spaces ${X}$ with a distinguished basepoint ${x_0}$. We call these pointed spaces.
• The morphisms are maps ${f : (X, x_0) \rightarrow (Y, y_0)}$, meaning ${f}$ is continuous and ${f(x_0) = y_0}$.

Now again we mod out:

Definition 15

Two maps ${f , g : (X, x_0) \rightarrow (Y, y_0)}$ of pointed spaces are homotopic if there is a homotopy between them which also fixes the basepoints. We can then, in the same way as before, define the quotient category ${\mathsf{hTop}_\ast}$.

And lo and behold:

Theorem 16

We have a functor

$\displaystyle \pi_1 : \mathsf{hTop}_\ast \rightarrow \mathsf{Grp}.$

Same corollaries as before.

# Notes on Publishing My Textbook

Hmm, so hopefully this will be finished within the next 10 years.

— An email of mine at the beginning of this project

My Euclidean geometry book was published last March or so. I thought I’d take the time to write about what the whole process of publishing this book was like, but I’ll start with the disclaimer that my process was probably not very typical and is unlikely to be representative of what everyone else does.

## Writing the Book

### The Idea

I’m trying to pin-point exactly when this project changed from “daydream” to “let’s do it”, but I’m not quite sure; here’s the best I can recount.

It was sometimes in the fall of 2013, towards the start of the school year; I think late September. I was a senior in high school, and I was only enrolled in two classes. It was fantastic, because it meant I had lots of time to study math. The superintendent of the school eventually found out, though, and forced me to enroll as an “office assistant” for three periods a day. Nonetheless, office assistant is not a very busy job, and so I had lots of time, all the time, every day.

Anyways, I had written a bit of geometry material for my math club the previous year, which was intended to be a light introduction. But in doing so I realized that there was much, much more I wanted to say, and so somewhere on my mental to-do list I added “flesh these notes out”. So one day, sitting in the office, after having spent another hour playing StarCraft, I finally got down to this item on the list. I hadn’t meant it to be a book; I was just wanted to finish what I had started the previous year. But sometimes your own projects spiral out of your control, and that’s what happened to me.

Really, I hadn’t come up with a brilliant idea that no one had thought of before. To my knowledge, no one had even tried yet. If I hadn’t gone and decided to write this book, someone else would have done it; maybe not right away, but within many years. Indeed, I was honestly surprised that I was the first one to make an attempt. The USAMO has been a serious contest since at least the 1990’s and 2000’s, and the demand for this book certainly existed well before my time. Really, I think this all just goes to illustrate that the Efficient Market Hypothesis is not so true in these kind of domains.

### Setting Out

Initially, this text was titled A Voyage in Euclidean Geometry and the filename Voyage.pdf would persist throughout the entire project even though the title itself would change throughout.

The beginning of the writing was actually quite swift. Like everyone else, I started out with an empty LaTeX file. But it was different from blank screens I’ve had to deal with in my life; rather than staring in despair (think English essay mode), I exploded. I was bursting with things I wanted to write. It was the result of having years of competitive geometry bottled up in my head. In fact, I still have the version 0 of the table of contents that came to life as I started putting things together.

• Angle Chasing (include “Fact 5”)
• Centers of the Triangle
• The Medial Triangle
• The Euler Line
• The Nine-Point Circle
• Circles
• Incircles and Excircles
• The Power of a Point
• Computational Geometry
• All the Areas (include Extended Sine Law, Ceva/Menelaus)
• Similar Triangles
• Homothety
• Stewart’s Theorem
• Ptolemy’s Theorem
• Some More Configurations (include symmedians)
• Simson lines
• Incircles and Excenters, Revisited
• Midpoints of Altitudes
• Circles Again
• Inversion
• Circles Inscribed in Segments
• The Miquel Point (include Brokard, this could get long)
• Spiral Similarity
• Projective Geometry
• Harmonic Division
• Brokard’s Theorem
• Pascal’s Theorem
• Computational Techniques
• Complex Numbers
• Barycentric Coordinates

Of course the table of contents changed drastically over time, but that wasn’t important. The point of the initial skeleton was to provide a bucket sort for all the things that I wanted to cover. Often, I would have three different sections I wanted to write, but like all humans I can only write one thing at a time, so I would have to create section headers for the other two and try to get the first section done as quickly as I could so that I could go and write the other two as well.

I did take the time to do some things correctly, mostly LaTeX. Some examples of things I did:

• Set up proper amsthm environments: earlier versions of the draft had “lemma”, “theorem”, “problem”, “exercise”, “proposition”, all distinct
• Set up an organized master LaTeX file with \include’s for the chapters, rather than having just one fat file.
• Set up shortcuts for setting up diagrams and so on.
• Set up a “hints” system where hints to the problems would be printed in random order at the end of the book.
• Set up a special command for new terms (\vocab). At the beginning all it did was made the text bold, but I suspected that later I might it do other things (like indexing).

In other words, whenever possible I would pay O(1) cost to get back O(n) returns. Indeed the point of using LaTeX for a long document is so that you can “say what you mean”: you type \begin{theorem} … \end{theorem}, and all the formatting is taken care of for you. Decide you want to change it later, and you only have to change the relevant code in the beginning.

And so, for three hours a day, five days a week, I sat in the main office of Irvington High School, pounding out chapter after chapter. I was essentially typing up what had been four years of competition experience; when you’re 17 years old, that’s a big chunk of your life.

I spent surprisingly little time revising (before first submission). Mostly I just fired away. I have always heard things about how important it is to rewrite things and how first drafts are always terrible, but I’m glad I ignored that advice at least at the beginning. It was immensely helpful to have the skeleton of the book laid out in a tangible form that I could actually see. That’s one thing I really like about writing; helps you collect your thoughts together.

It’s possible that this is part of my writing style; compared to what everyone says I should do, I don’t do very much rewriting. My first and final drafts tend to look pretty similar. I think this is just because when I write something that’s not an English essay, I already have a reasonably good idea what I want to say, and that the process of writing it out does much of the polishing for me. I’m also typically pretty hesitant when I write things: I do a lot of pausing for a few minutes deciding whether this sentence is really what I want before actually writing it down, even in drafts.

### Some Encouragement

By late October, I had about 80 or so pages content written. Not that impressive if you think about it; I think it works out to something like 4 pages per day. In fact, looking through my data, I’m pretty sure I had a pretty consistent writing rate of about 30 minutes per page. It didn’t matter, since I had so much time.

At this point, I was beginning to think about possibly publishing the book, so it was coming out reasonably well. It was a bit embarrassing, since as far as I could tell, publishing books was done by people who were actually professionals in some way or another. So I reached out to a couple of teachers of mine (not high school) who I knew had published textbooks in one form or another; I politely asked them what their thoughts were, and if they had any advice. I got some gentle encouragement, but also a pointer to self-publishing: turns out in this day and age, there are services like Lulu or CreateSpace that will just let you publish… whatever you want. This gave me the guts to keep working on this, because it meant that there was a minimal floor: even if I couldn’t get a traditional publisher, the worst I could do was self-publish through Amazon, which was at any rate strictly better than the plan of uploading a PDF somewhere.

So I kept writing. The seasons turned, and by February, the draft was 200 pages strong. In April, I had staked out a whopping 333 pages.

## The Review Process

### Entering the MAA’s Queue

I was finally beginning to run out of things I wanted to add, after about six months of endless typing. So I decided to reach out again; this time I contacted a professor (henceforth Z) that I knew, whom I knew well from time at the Berkeley Math Circle. After some discussion, Z agreed to look briefly at an early draft of the manuscript to get a feel for what it was like. I must have exceeded his expectations, because Z responded enthusiastically suggesting that I submit it to the Problem Book Series of the MAA. As it turns out, he was on the editorial board, so in just a few days my book was in the official queue.

This was all in April. The review process was scheduled to begin in June, and likely take the entirety of the summer. I was told that if I had a more revised draft before the review that I should also send it in.

It was then I decided I needed to get some feedback. So, I reached out to a few of my close friends asking them if they’d be willing to review drafts of the manuscript. This turned out to not go quite as well as I hoped, since

• Many people agreed eagerly, but then didn’t actually follow through with going through and reading chapter by chapter.
• I was stupid enough to send the entire manuscript rather than excerpts, and thus ran myself a huge risk of getting the text leaked. Fortunately, I have good friends, but it nagged at me for quite a while. Learned my lesson there.

That’s not to say it was completely useless; I did get some typos fixed. But just not as many as I hoped.

### The First Review

Not very much happened for the rest of the summer while I waited impatiently; it was a long four month wait for me. Finally, in the end of August 2014, I got the comments from the board; I remember I was practicing the piano at Harvard when I saw the email.

There had been six reviews. While I won’t quote the exact reviews, I’ll briefly summarize them.

1. There is too much idiosyncratic terminology.
2. This is pretty impressive, but will need careful editing.
3. This project is fantastic; the author should be encouraged to continue.
4. This is well developed; may need some editing of contents since some topics are very advanced.
5. Overall I like this project. That said, it could benefit from some reading and editing. For example, here are some passages in particular that aren’t clear.
6. This manuscript reads well, written at a fairly high level. The motivation provided are especially good. It would be nice if there were some solutions or at least longer hints for the (many) problems in the text. Overall the author should be encouraged to continue.

The most surprising thing was how short the comments were. I had expected that, given the review had consumed the entire summer, the reviewers would at least have read the manuscript in detail. But it turns out that mostly all that had been obtained were cursory impressions from the board members: the first four reviews were only a few sentences long! The fifth review was more detailed, but it was essentially a “spot check”.

I admit, I was really at a loss for how I should proceed. The comments were not terribly specific, and the only real action-able item were to use less extravagant terms in response to 1 (I originally had “configuration”, “exercise” vs “problem”, etc.) and to add solutions (in response to 5). When I showed he comments to Z, he commented that while they were positive, they seemed to suggest that the publication may not be anytime soon. So I decided to try submitting a second draft to the MAA, but if that didn’t work I would fall back on the self-publishing route.

The reviewers had commented about finding a few typos, so I again enlisted the help of some friends of mine to eliminate them. This time I was a lot smarter. First, I only sent the relevant excerpts that I wanted them to read, and watermarked the PDF’s with the names of the recipients. Secondly, this time I paid them as well: specifically, I gave $40 + \min(40, 0.1n^2)$ dollars for each chapter read, where $n$ was the number of errors found. I also gave a much clearer “I need this done by X” deadline. This worked significantly better than my first round of edits. Note to self: people feel more obliged to do a good job if you pay them!

All in all my friends probably eliminated about 500 errors.

I worked as rapidly as I could, and within four weeks I had the new version. The changes that I made were:

• In response to the first board comment, I eliminated some of the most extravagant terminology (“demonstration”, “configuration”, etc.) in favor of more conventional terms (“example”, “lemma”).
• I picked about 5-10 problems from each chapter and added full solutions for them. This inflated the manuscript by another 70 pages, for a new total of 400 pages.
• Many typos and revisions were corrected, thanks to my team of readers.
• Some formatting changes; most notably, I got the idea to put theorems and lemmas in boxes using mdframed (most of my recent olympiad handouts have the same boxes).

I sent this out and sat back.

### The Second Review

What followed was another long waiting process for what again were ended up being cursory comments The delay between the first and second review was definitely the most frustrating part — there seemed to be nothing I could do other than sit and wait. I seriously considered dropping the MAA and self-publishing during this time.

I had been told to expect comments back in the spring. Finally, in early April I poked the editorial board again asking whether there had been any progress, and was horrified to find out that the process hadn’t even started out due to a miscommunication. Fortunately, the editor was apologetic enough about the error that she asked the board to try to expedite the process a little. The comments then arrived in mid-May, six weeks afterwards.

There were eight reviewers this time. In addition to some stylistic changes suggested (e.g. avoid contractions), here were some of the main comments.

• The main complaint was that I had been a bit too informal. They were right on all accounts here: in the draft I had sent, the chapters had opened with some quotes from years of MOP (which confused the board, for obvious reasons) and I had some snarky comments about high school geometry (since I happen to despise the way Euclidean geometry is taught in high school.) I found it amusing that no one had brought it up yet, and happily obliged to fix them.
• Some reviewers had pointed out that some of the topics were very advanced. In fact, one of the reviewers actually recommend against the publication of the book on the account that no one would want to buy it. Fortunately, the book ended up getting accepted anyways.
• In that vein, there were some remarks that this book, although it serves its target audience well, is written at a fairly advanced level.

Some of the reviews were cursory like before, but some of them were line-by-line readings of a random chapter, and so this time I had something more tangible to work with.

So I proceeded to make the changes. For the first time, I finally had the brains to start using git to track the changes I made to the book. This was an enormously good idea, and I wish I had done so earlier.

Here are some selected changes that were made (the full list of changes is quite long).

• Eliminate a bunch of snarky comments about high school, and the MOP quotes.
• Eliminate about 50 instances of unnecessary future tense.
• Eliminate the real product from the text.
• Added and improved significantly on the index of the book, making it far more complete.
• Fix more references.
• Change the title to “Euclidean Geometry in Mathematical Olympiads” (it was originally “Geometra Galactica”).
• Change the name of Part II from “Dark Arts” to “Analytic Techniques”. (Hehe.)
• Added people to the acknowledgments.
• Changes in formatting: most notably I change the font size from 11pt to 10pt to decrease the page count, since my book was already twice as long as many of the other books in the series. This dropped me from about 400 pages back to about 350 pages.
• Fix about 200 more typos. Thanks to those of you who found them!

I sent out the third draft just as June started, about three weeks after I had received the comments. (I like to work fast.)

### The Last Revisions

There were another two rounds afterwards. In late June, I got a small set of about three pages of additional typos and clarifying suggestions. I sent back the third draft one day later.

Six days later, I got back a list of four remaining edits to make. I sent an updated fourth draft 17 minutes after receiving those comments. Unfortunately, it then took another five weeks for the four changes I made to be acknowledged. Finally, in early August, the changes were approved and the editorial board forwarded an official recommendation to MAA to publish the book.

### Summary of Review Timeline

In summary, the timeline of the review process was

• First draft submitted: April 6, 2014
• Feedback received: August 28, 2014
Second draft submitted: November 5, 2014
• Feedback received: May 19, 2015
Third draft submitted: June 23, 2015
• Feedback received: June 29, 2015
Fourth draft submitted: June 29, 2015
• Official recommendation to MAA made: August 2015

I think with traditional publishers there is a lot of waiting; my understanding is that the editorial board largely consists of volunteers, so this seems inevitable.

## Approval and Onwards

On September 3, 2015, I got the long-awaited message:

It is a pleasure to inform you that the MAA Council on Books has approved the recommendation of the MAA Problem Books editorial board to publish your manuscript, Euclidean Geometry in Mathematical Olympiads.

I got a fairly standard royalty contract from the publisher, which I signed off without much thought.

### Editing

I had a total of zero math editors and one copy editor provided. It shows through on the enormous list of errors (and this is after all the mistakes my friends helped me find).

Fortunately, my copy editor was quite good (and I have a lot of sympathy for this poor soul, who had to read every word of the entire manuscript). My Git history indicates that approximately 1000 corrections were made; on average, this is about 2 per page, which sounds about right. I got the corrections on hard copy in the mail; the entire printout of my book, except well marked with red ink.

Many of the changes fell into general shapes:

• Capitalization. I was unwittingly inconsistent with “Law of Cosines” versus “Law of cosines” versus “law of cosines”, etc and my copy editor noticed every one of these. Similarly, cases of section and chapter titles were often not consistent; should I use “Angle Chasing” or “Angle chasing”? The main point is to pick one convention and stick with it.
• My copy editor pointed out every time I used “Problems for this section” and had only one problem.
• Several unnecessary “quotes” and italics were deleted.
• Oxford commas. My god, so many Oxford commas. You just don’t notice when the IMO Shortlist says “the circle through the points E, G, and H” but the European Girls’ Olympiad says “show that KH, EM and BC are concurrent”. I swear there were at least 100 of these in the boko. I tried to write a regular expression to find such mistakes, but there were lots of edge cases that came up, and I still had to do many of these manually.
• Inconsistency of em dashes and en dashes. This one worked better with regular expressions.

But of course there were plenty of other mistakes like missing spaces, missing degree spaces, punctuation errors, etc.

### Cover Art

This was handled for me by the publisher: they gave me a choice of five or so designs and I picked one I liked.

(If you are self-publishing, this is actually one of the hardest parts of the publishing logistics; you need to design the cover on your own.)

### Proofs

It turns out that after all the hard work I spent on formatting the draft, the MAA has a standard template and had the production team re-typeset the entire book using this format. Fortunately, the publisher’s format is pretty similar to mine, and so there were no huge cosmetic changes.

At this point I got the proofs, which are essentially the penultimate drafts of the book as they will be sent to the printers.

### Affiliation and Miscellani

There was a bit more back-and-forth with the publisher towards the end. For example, they asked me if I would like my affiliation to be listed as MIT or to not have an affiliation. I chose the latter. I also send them a bio and photograph, and an author questionaire, asking me for some standard details.

Marketing was handled by the publisher based on these details.

## The End

Without warning, I got an email on March 25 announcing that the PDF versions of my book were now available on MAA website. The hard copies followed a few months afterwards. That marked the end of my publication process.

If I were to do this sort of thing again, I guess the main decision would be whether to self-publish or go through a formal publisher. The main disadvantage seems to be the time delay, and possibly also that the royalties are lesser than in self-publishing. On the flip side, the advantages of a formal publisher were:

• Having a real copy editor read through the entire manuscript.
• Having a committee of outsiders knock some common sense into me (e.g. not calling the book “Geometra Galactica”).
• Having cover art and marketing completely done for me.
• It’s more prestigious; having a real published book is (for whatever reason) a very nice CV item.

Overall I think publishing formally was the right thing to do for this book, but your mileage may vary.

Other advice I would give to my past self, mentioned above already: keep paying O(1) for O(n), use git to keep track of all versions, and be conscious about which grammatical conventions to use (in particular, stay consistent).

Here’s a better concluding question: what surprised me about the process, i.e, what was different than what I expected? Here’s a partial list of answers:

• It took even longer than I was expecting. Large committees are inherently slow; this is no slight to the MAA, it is just how these sorts of things work.
• I was surprised that at no point did anyone really check the manuscript for mathematical accuracy. In hindsight this should have been obvious; I expect reading the entire book properly takes at least 1-2 years.
• I was astounded by how many errors there were in the text, be it math or grammatical or so on. During the entire process something like 2000 errors were corrected (admittedly several were minor, like Oxford commas). Yet even as I published the book, I knew that there had to be errors left. But it was still irritating to hear about them post-publication.

All in all, the entire process started in September 2013 and ended in March 2016, which is 30 months. The time was roughly 30% writing, 50% review, and 20% production.

# DNSCrypt Setup with PDNSD

Here are notes for setting up DNSCrypt on Arch Linux, using pdnsd as a DNS cache, assuming the use of NetworkManager. I needed it one day since the network I was using blocked traffic to external DNS servers (parental controls), and the DNS server provided had an outdated entry for hmmt.co. (My dad then pointed out to me I could have just hard-coded the necessary IP address in /etc/hosts, oops.)

For the whole process, useful commands to test with are:

• nslookup hmmt.co will tell you the IP used and the server from which it came.
• dig http://www.hmmt.co gives much more detailed information to this effect. (From bind-tools.)
• dig @127.0.0.1 http://www.hmmt.co lets you query a specific DNS server (in this case 127.0.0.1).
• drill @127.0.0.1 http://www.hmmt.co behaves similarly.

First, pacman -S pdnsd dnscrypt-proxy (with sudo ostensibly, but I’ll leave that out here and henceforth).

Run systemctl edit dnscrypt-proxy.socket and fill in override.conf with

[Socket]
ListenStream=
ListenDatagram=
ListenStream=127.0.0.1:40
ListenDatagram=127.0.0.1:40


Optionally, one can also specify which server which DNS serve to use with systemctl edit dnscrypt-proxy.service. For example for cs-uswest I write

[Service]
ExecStart=
ExecStart=/usr/bin/dnscrypt-proxy \
-R cs-uswest


The empty ExecStart= is necessary, since otherwise systemctl will complain about multiple ExecStart commands.

This thus configures dnscrypt-proxy to listen on 127.0.0.1, port 40.

Now we configure pdnsd to listen on port 53 (default) for cache, and relay cache misses to dnscrypt-proxy. This is accomplished by using the following for /etc/pdnsd.conf:

global {
perm_cache = 1024;
cache_dir = "/var/cache/pdnsd";
run_as = "pdnsd";
server_ip = 127.0.0.1;
status_ctl = on;
query_method = udp_tcp;
min_ttl = 15m;       # Retain cached entries at least 15 minutes.
max_ttl = 1w;        # One week.
timeout = 10;        # Global timeout option (10 seconds).
neg_domain_pol = on;
udpbufsize = 1024;   # Upper limit on the size of UDP messages.
}

server {
label = "dnscrypt-proxy";
ip = 127.0.0.1;
port = 40;
timeout = 4;
proxy_only = on;
}

source {
owner = localhost;
file = "/etc/hosts";
}


Now it remains to change the DNS server from whatever default is used into 127.0.0.1. For NetworkManager users, it is necessary to edit /etc/NetworkManager/NetworkManager.conf to prevent it from overriding this file:

[main]
...
dns=none


This will cause resolv.conf to be written as an empty file by NetworkManager: in this case, the default 127.0.0.1 is used as the nameserver, which is what we want.

Needless to say, one finishes with

systemctl enable dnscrypt-proxy
systemctl start dnscrypt-proxy
systemctl enable pdnsd
systemctl start pdnsd


# A Sketchy Overview of Green-Tao

These are the notes of my last lecture in the 18.099 discrete analysis seminar. It is a very high-level overview of the Green-Tao theorem. It is a subset of this paper.

## 1. Synopsis

This post as in overview of the proof of:

Theorem 1 (Green-Tao)

The prime numbers contain arbitrarily long arithmetic progressions.

Here, Szemerédi’s theorem isn’t strong enough, because the primes have density approaching zero. Instead, one can instead try to prove the following “relativity” result.

Theorem (Relative Szemerédi)

Let ${S}$ be a sparse “pseudorandom” set of integers. Then subsets of ${A}$ with positive density in ${S}$ have arbitrarily long arithmetic progressions.

In order to do this, we have to accomplish the following.

• Make precise the notion of “pseudorandom”.
• Prove the Relative Szemerédi theorem, and then
• Exhibit a “pseudorandom” set ${S}$ which subsumes the prime numbers.

This post will use the graph-theoretic approach to Szemerédi as in the exposition of David Conlon, Jacob Fox, and Yufei Zhao. In order to motivate the notion of pseudorandom, we return to the graph-theoretic approach of Roth’s theorem, i.e. the case ${k=3}$ of Szemerédi’s theorem.

## 2. Defining the linear forms condition

### 2.1. Review of Roth theorem

Roth’s theorem can be phrased in two ways. The first is the “set-theoretic” formulation:

Theorem 2 (Roth, set version)

If ${A \subseteq \mathbb Z/N}$ is 3-AP-free, then ${|A| = o(N)}$.

The second is a “weighted” version

Theorem 3 (Roth, weighted version)

Fix ${\delta > 0}$. Let ${f : \mathbb Z/N \rightarrow [0,1]}$ with ${\mathbf E f \ge \delta}$. Then

$\displaystyle \Lambda_3(f,f,f) \ge \Omega_\delta(1).$

We sketch the idea of a graph-theoretic proof of the first theorem. We construct a tripartite graph ${G_A}$ on vertices ${X \sqcup Y \sqcup Z}$, where ${X = Y = Z = \mathbb Z/N}$. Then one creates the edges

• ${(x,y)}$ if ${2x+ y \in A}$,
• ${(x,z)}$ if ${x-z \in A}$, and
• ${(y,z)}$ if ${-y-2z \in A}$.

This construction is selected so that arithmetic progressions in ${A}$ correspond to triangles in the graph ${G_A}$. As a result, if ${A}$ has no 3-AP’s (except trivial ones, where ${x+y+z=0}$), the graph ${G_A}$ has exactly one triangle for every edge. Then, we can use the theorem of Ruzsa-Szemerédi, which states that this graph ${G_A}$ has ${o(n^2)}$ edges.

### 2.2. The measure ${\nu}$

Now for the generalized version, we start with the second version of Roth’s theorem. Instead of a set ${S}$, we consider a function

$\displaystyle \nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}$

which we call a majorizing measure. Since we are now dealing with ${A}$ of low density, we normalize ${\nu}$ so that

$\displaystyle \mathbf E[\nu] = 1 + o(1).$

Our goal is to now show a result of the form:

Theorem (Relative Roth, informally, weighted version)

If ${0 \le f \le \nu}$, ${\mathbf E f \ge \delta}$, and ${\nu}$ satisfies a “pseudorandom” condition, then ${\Lambda_3(f,f,f) \ge \Omega_{\delta}(1)}$.

The prototypical example of course is that if ${A \subset S \subset \mathbb Z_N}$, then we let ${\nu(x) = \frac{N}{|S|} 1_S(x)}$.

### 2.3. Pseudorandomness for ${k=3}$

So, how can we put the pseudorandom condition? Initially, consider ${G_S}$ the tripartite graph defined earlier, and let ${p = |S| / N}$; since ${S}$ is sparse we expect ${p}$ small. The main idea that turns out to be correct is: The number of embeddings of ${K_{2,2,2}}$ in ${S}$ is “as expected”, namely ${(1+o(1)) p^{12} N^6}$. Here ${K_{2,2,2}}$ is actually the ${2}$-blow-up of a triangle. This condition thus gives us control over the distribution of triangles in the sparse graph ${G_S}$: knowing that we have approximately the correct count for ${K_{2,2,2}}$ is enough to control distribution of triangles.

For technical reasons, in fact we want this to be true not only for ${K_{2,2,2}}$ but all of its subgraphs ${H}$.

Now, let’s move on to the weighted version. Let’s consider a tripartite graph ${G}$, which we can think of as a collection of three functions

\displaystyle \begin{aligned} \mu_{-z} &: X \times Y \rightarrow \mathbb R \\ \mu_{-y} &: X \times Z \rightarrow \mathbb R \\ \mu_{-x} &: Y \times Z \rightarrow \mathbb R. \end{aligned}

We think of ${\mu}$ as normalized so that ${\mathbf E[\mu_{-x}] = \mathbf E[\mu_{-y}] = \mathbf E[\mu_{-z}] = 1}$. Then we can define

Definition 4

A weighted tripartite graph ${\mu = (\mu_{-x}, \mu_{-y}, \mu_{-z})}$ satisfies the ${3}$-linear forms condition if

\displaystyle \begin{aligned} \mathbf E_{x^0,x^1,y^0,y^1,z^0,z^1} &\Big[ \mu_{-x}(y^0,z^0) \mu_{-x}(y^0,z^1) \mu_{-x}(y^1,z^0) \mu_{-x}(y^1,z^1) \\ & \mu_{-y}(x^0,z^0) \mu_{-y}(x^0,z^1) \mu_{-y}(x^1,z^0) \mu_{-y}(x^1,z^1) \\ & \mu_{-z}(x^0,y^0) \mu_{-z}(x^0,y^1) \mu_{-z}(x^1,y^0) \mu_{-z}(x^1,y^1) \Big] \\ &= 1 + o(1) \end{aligned}

and similarly if any of the twelve factors are deleted.

Then the pseudorandomness condition is according to the graph we defined above:

Definition 5

A function ${\nu : \mathbb Z / N \rightarrow \mathbb Z}$ is satisfies the ${3}$-linear forms condition if ${\mathbf E[\nu] = 1 + o(1)}$, and the tripartite graph ${\mu = (\mu_{-x}, \mu_{-y}, \mu_{-z})}$ defined by

\displaystyle \begin{aligned} \mu_{-z} &= \nu(2x+y) \\ \mu_{-y} &= \nu(x-z) \\ \mu_{-x} &= \nu(-y-2z) \end{aligned}

satisfies the ${3}$-linear forms condition.

Finally, the relative version of Roth’s theorem which we seek is:

Theorem 6 (Relative Roth)

Suppose ${\nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}}$ satisfies the ${3}$-linear forms condition. Then for any ${f : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}}$ bounded above by ${\nu}$ and satisfying ${\mathbf E[f] \ge \delta > 0}$, we have

$\displaystyle \Lambda_3(f,f,f) \ge \Omega_{\delta}(1).$

### 2.4. Relative Szemerédi

We of course have:

Theorem 7 (Szemerédi)

Suppose ${k \ge 3}$, and ${f : \mathbb Z/n \rightarrow [0,1]}$ with ${\mathbf E[f] \ge \delta}$. Then

$\displaystyle \Lambda_k(f, \dots, f) \ge \Omega_{\delta}(1).$

For ${k > 3}$, rather than considering weighted tripartite graphs, we consider a ${(k-1)}$-uniform ${k}$-partite hypergraph. For example, given ${\nu}$ with ${\mathbf E[\nu] = 1 + o(1)}$ and ${k=4}$, we use the construction

\displaystyle \begin{aligned} \mu_{-z}(w,x,y) &= \nu(3w+2x+y) \\ \mu_{-y}(w,x,z) &= \nu(2w+x-z) \\ \mu_{-x}(w,y,z) &= \nu(w-y-2z) \\ \mu_{-w}(x,y,z) &= \nu(-x-2y-3z). \end{aligned}

Thus 4-AP’s correspond to the simplex ${K_4^{(3)}}$ (i.e. a tetrahedron). We then consider the two-blow-up of the simplex, and require the same uniformity on subgraphs of ${H}$.

Here is the compiled version:

Definition 8

A ${(k-1)}$-uniform ${k}$-partite weighted hypergraph ${\mu = (\mu_{-i})_{i=1}^k}$ satisfies the ${k}$-linear forms condition if

$\displaystyle \mathbf E_{x_1^0, x_1^1, \dots, x_k^0, x_k^1} \left[ \prod_{j=1}^k \prod_{\omega \in \{0,1\}^{[k] \setminus \{j\}}} \mu_{-j}\left( x_1^{\omega_1}, \dots, x_{j-1}^{\omega_{j-1}}, x_{j+1}^{\omega_{j+1}}, \dots, x_k^{\omega_k} \right)^{n_{j,\omega}} \right] = 1 + o(1)$

for all exponents ${n_{j,w} \in \{0,1\}}$.

Definition 9

A function ${\nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}}$ satisfies the ${k}$-linear forms condition if ${\mathbf E[\nu] = 1 + o(1)}$, and

$\displaystyle \mathbf E_{x_1^0, x_1^1, \dots, x_k^0, x_k^1} \left[ \prod_{j=1}^k \prod_{\omega \in \{0,1\}^{[k] \setminus \{j\}}} \nu\left( \sum_{i=1}^k (j-i)x_i^{(\omega_i)} \right)^{n_{j,\omega}} \right] = 1 + o(1)$

for all exponents ${n_{j,w} \in \{0,1\}}$. This is just the previous condition with the natural ${\mu}$ induced by ${\nu}$.

The natural generalization of relative Szemerédi is then:

Theorem 10 (Relative Szemerédi)

Suppose ${k \ge 3}$, and ${\nu : \mathbb Z/n \rightarrow \mathbb R_{\ge 0}}$ satisfies the ${k}$-linear forms condition. Let ${f : \mathbb Z/N to \mathbb R_{\ge 0}}$ with ${\mathbf E[f] \ge \delta}$, ${f \le \nu}$. Then

$\displaystyle \Lambda_k(f, \dots, f) \ge \Omega_{\delta}(1).$

## 3. Outline of proof of Relative Szemerédi

The proof of Relative Szeremédi uses two key facts. First, one replaces ${f}$ with a bounded ${\widetilde f}$ which is near it:

Theorem 11 (Dense model)

Let ${\varepsilon > 0}$. There exists ${\varepsilon' > 0}$ such that if:

• ${\nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}}$ satisfies ${\left\lVert \nu-1 \right\rVert^{\square}_r \le \varepsilon'}$, and
• ${f : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}}$, ${f \le \nu}$

then there exists a function ${\widetilde f : \mathbb Z/N \rightarrow [0,1]}$ such that ${\left\lVert f - \widetilde f \right\rVert^{\square}_r \le \varepsilon}$.

Here we have a new norm, called the cut norm, defined by

$\displaystyle \left\lVert f \right\rVert^{\square}_r = \sup_{A_i \subseteq (\mathbb Z/N)^{r-1}} \left\lvert \mathbf E_{x_1, \dots, x_r} f(x_1 + \dots + x_r) 1_{A_1}(x_{-1}) \dots 1_{A_r}(x_{-r}) \right\rvert.$

This is actually an extension of the cut norm defined on a ${r}$-uniform ${r}$-partite hypergraph (not ${(r-1)}$-uniform like before!): if ${g : X_1 \times \dots \times X_r \rightarrow \mathbb R}$ is such a graph, we let

$\displaystyle \left\lVert g \right\rVert^{\square}_{r,r} = \sup_{A_i \subseteq X_{-i}} \left\lvert g(x_1, \dots, x_r) 1_{A_1}(x_{-1}) \dots 1_{A_r}(x_{-r}) \right\rvert.$

Taking ${g(x_1, \dots, x_r) = f(x_1 + \dots + x_r)}$, ${X_1 = \dots = X_r = \mathbb Z/N}$ gives the analogy.

For the second theorem, we define the norm

$\displaystyle \left\lVert g \right\rVert^{\square}_{k-1,k} = \max_{i=1,\dots,k} \left( \left\lVert g_{-i} \right\rVert^{\square}_{k-1, k-1} \right).$

Theorem 12 (Relative simplex counting lemma)

Let ${\mu}$, ${g}$, ${\widetilde g}$ be weighted ${(k-1)}$-uniform ${k}$-partite weighted hypergraphs on ${X_1 \cup \dots \cup X_k}$. Assume that ${\mu}$ satisfies the ${k}$-linear forms condition, and ${0 \le g_{-i} \le \mu_{-i}}$ for all ${i}$, ${0 \le \widetilde g \le 1}$. If ${\left\lVert g-\widetilde g \right\rVert^{\square}_{k-1,k} = o(1)}$ then

$\displaystyle \mathbf E_{x_1, \dots, x_k} \left[ g(x_{-1}) \dots g(x_{-k}) - \widetilde g(x_{-1}) \dots \widetilde g(x_{-k}) \right] = o(1).$

One then combines these two results to prove Szemerédi, as follows. Start with ${f}$ and ${\nu}$ in the theorem. The ${k}$-linear forms condition turns out to imply ${\left\lVert \nu-1 \right\rVert^{\square}_{k-1} = o(1)}$. So we can find a nearby ${\widetilde f}$ by the dense model theorem. Then, we induce ${\nu}$, ${g}$, ${\widetilde g}$ from ${\mu}$, ${f}$, ${\widetilde f}$ respectively. The counting lemma then reduce the bounding of ${\Lambda_k(f, \dots, f)}$ to the bounding of ${\Lambda_k(\widetilde f, \dots, \widetilde f)}$, which is ${\Omega_\delta(1)}$ by the usual Szemerédi theorem.

## 4. Arithmetic progressions in primes

We now sketch how to obtain Green-Tao from Relative Szemerédi. As expected, we need to us the von Mangoldt function ${\Lambda}$.

Unfortunately, ${\Lambda}$ is biased (e.g. “all decent primes are odd”). To get around this, we let ${w = w(N)}$ tend to infinity slowly with ${N}$, and define

$\displaystyle W = \prod_{p \le w} p.$

In the ${W}$-trick we consider only primes ${1 \pmod W}$. The modified von Mangoldt function then is defined by

$\displaystyle \widetilde \Lambda(n) = \begin{cases} \frac{\varphi(W)}{W} \log (Wn+1) & Wn+1 \text{ prime} \\ 0 & \text{else}. \end{cases}$

In accordance with Dirichlet, we have ${\sum_{n \le N} \widetilde \Lambda(n) = N + o(N)}$.

So, we need to show now that

Proposition 13

Fix ${k \ge 3}$. We can find ${\delta = \delta(k) > 0}$ such that for ${N \gg 1}$ prime, we can find ${\nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}}$ which satisfies the ${k}$-linear forms condition as well as

$\displaystyle \nu(n) \ge \delta \widetilde \Lambda(n)$

for ${N/2 \le n < N}$.

In that case, we can let

$\displaystyle f(n) = \begin{cases} \delta \widetilde\Lambda(n) & N/2 \le n < N \\ 0 & \text{else}. \end{cases}$

Then ${0 \le f \le \nu}$. The presence of ${N/2 \le n < N}$ allows us to avoid “wrap-around issues” that arise from using ${\mathbb Z/N}$ instead of ${\mathbb Z}$. Relative Szemerédi then yields the result.

For completeness, we state the construction. Let ${\chi : \mathbb R \rightarrow [0,1]}$ be supported on ${[-1,1]}$ with ${\chi(0) = 1}$, and define a normalizing constant ${c_\chi = \int_0^\infty \left\lvert \chi'(x) \right\rvert^2 \; dx}$. Inspired by ${\Lambda(n) = \sum_{d \mid n} \mu(d) \log(n/d)}$, we define a truncated ${\Lambda}$ by

$\displaystyle \Lambda_{\chi, R}(n) = \log R \sum_{d \mid n} \mu(d) \chi\left( \frac{\log d}{\log R} \right).$

Let ${k \ge 3}$, ${R = N^{k^{-1} 2^{-k-3}}}$. Now, we define ${\nu}$ by

$\displaystyle \nu(n) = \begin{cases} \dfrac{\varphi(W)}{W} \dfrac{\Lambda_{\chi,R}(Wn+1)^2}{c_\chi \log R} & N/2 \le n < N \\ 0 & \text{else}. \end{cases}$

This turns out to work, provided ${w}$ grows sufficiently slowly in ${N}$.

# Formal vs Functional Series (OR: Generating Function Voodoo Magic)

Epistemic status: highly dubious. I found almost no literature doing anything quite like what follows, which unsettles me because it makes it likely that I’m overcomplicating things significantly.

## 1. Synopsis

Recently I was working on an elegant problem which was the original problem 6 for the 2015 International Math Olympiad, which reads as follows:

Problem

[IMO Shortlist 2015 Problem C6] Let ${S}$ be a nonempty set of positive integers. We say that a positive integer ${n}$ is clean if it has a unique representation as a sum of an odd number of distinct elements from ${S}$. Prove that there exist infinitely many positive integers that are not clean.

Proceeding by contradiction, one can prove (try it!) that in fact all sufficiently large integers have exactly one representation as a sum of an even subset of ${S}$. Then, the problem reduces to the following:

Problem

Show that if ${s_1 < s_2 < \dots}$ is an increasing sequence of positive integers and ${P(x)}$ is a nonzero polynomial then we cannot have

$\displaystyle \prod_{j=1}^\infty (1 - x^{s_j}) = P(x)$

as formal series.

To see this, note that all sufficiently large ${x^N}$ have coefficient ${1 + (-1) = 0}$. Now, the intuitive idea is obvious: the root ${1}$ appears with finite multiplicity in ${P}$ so we can put ${P(x) = (1-x)^k Q(x)}$ where ${Q(1) \neq 0}$, and then we get that ${1-x}$ on the RHS divides ${P}$ too many times, right?

Well, there are some obvious issues with this “proof”: for example, consider the equality

$\displaystyle 1 = (1-x)(1+x)(1+x^2)(1+x^4)(1+x^8) \dots.$

The right-hand side is “divisible” by ${1-x}$, but the left-hand side is not (as a polynomial).

But we still want to use the idea of plugging ${x \rightarrow 1^-}$, so what is the right thing to do? It turns out that this is a complete minefield, and there are a lot of very subtle distinctions that seem to not be explicitly mentioned in many places. I think I have a complete answer now, but it’s long enough to warrant this entire blog post.

Here’s the short version: there’s actually two distinct notions of “generating function”, namely a “formal series” and “functional series”. They use exactly the same notation but are two different types of objects, and this ends up being the source of lots of errors, because “formal series” do not allow substituting ${x}$, while “functional series” do.

Spoiler: we’ll need the asymptotic for the partition function ${p(n)}$.

## 2. Formal Series ${\neq}$ Functional Series

I’m assuming you’ve all heard the definition of ${\sum_k c_kx^k}$. It turns out unfortunately that this isn’t everything: there are actually two types of objects at play here. They are usually called formal power series and power series, but for this post I will use the more descriptive names formal series and functional series. I’ll do everything over ${\mathbb C}$, but one can of course use ${\mathbb R}$ instead.

The formal series is easier to describe:

Definition 1

A formal series ${F}$ is an infinite sequence ${(a_n)_n = (a_0, a_1, a_2, \dots)}$ of complex numbers. We often denote it by ${\sum a_nx^n = a_0 + a_1x + a_2x^2 + \dots}$. The set of formal series is denoted ${\mathbb C[ [x] ]}$.

This is the “algebraic” viewpoint: it’s a sequence of coefficients. Note that there is no worry about convergence issues or “plugging in ${x}$”.

On the other hand, a functional series is more involved, because it has to support substitution of values of ${x}$ and worry about convergence issues. So here are the necessary pieces of data:

Definition 2

A functional series ${G}$ (centered at zero) is a function ${G : U \rightarrow \mathbb C}$, where ${U}$ is an open disk centered at ${0}$ or ${U = \mathbb C}$. We require that there exists an infinite sequence ${(c_0, c_1, c_2, \dots)}$ of complex numbers satisfying

$\displaystyle \forall z \in U: \qquad G(z) = \lim_{N \rightarrow \infty} \left( \sum_{k=0}^N c_k z^k \right).$

(The limit is take in the usual metric of ${\mathbb C}$.) In that case, the ${c_i}$ are unique and called the coefficients of ${G}$.

This is often written as ${G(x) = \sum_n c_n x^n}$, with the open set ${U}$ suppressed.

Remark 3

Some remarks on the definition of functional series:

• This is enough to imply that ${G}$ is holomorphic (and thus analytic) on ${U}$.
• For experts: note that I’m including the domain ${U}$ as part of the data required to specify ${G}$, which makes the presentation cleaner. Most sources do something with “radius of convergence”; I will blissfully ignore this, leaving this data implicitly captured by ${U}$.
• For experts: Perhaps non-standard, ${U \neq \{0\}}$. Otherwise I can’t take derivatives, etc.

Thus formal and functional series, despite having the same notation, have different types: a formal series ${F}$ is a sequence, while a functional series ${G}$ is a function that happens to be expressible as an infinite sum within its domain.

Of course, from every functional series ${G}$ we can extract its coefficients and make them into a formal series ${F}$. So, for lack of better notation:

Definition 4

If ${F = (a_n)_n}$ is a formal series, and ${G : U \rightarrow \mathbb C}$ is a functional series whose coefficients equal ${F}$, then we write ${F \simeq G}$.

## 3. Finite operations

Now that we have formal and functional series, we can define sums. Since these are different types of objects, we will have to run definitions in parallel and then ideally check that they respect ${\simeq}$.

For formal series:

Definition 5

Let ${F_1 = (a_n)_n}$ and ${F_2 = (b_n)_n}$ be formal series. Then we set

\displaystyle \begin{aligned} (a_n)_n \pm (b_n)_n &= (a_n \pm b_n)_n \\ (a_n)_n \cdot (b_n)_n &= \left( \textstyle\sum_{j=0}^n a_jb_{n-j} \right)_n. \end{aligned}

This makes ${\mathbb C[ [x] ]}$ into a ring, with identity ${(0,0,0,\dots)}$ and ${(1,0,0,\dots)}$.

We also define the derivative ${F = (a_n)_n}$ by ${F' = ((n+1)a_{n+1})_n}$.

It’s probably more intuitive to write these definitions as

\displaystyle \begin{aligned} \sum_n a_n x^n \pm \sum_n b_n x^n &= \sum_n (a_n \pm b_n) x^n \\ \left( \sum_n a_n x^n \right) \left( \sum_n b_n x^n \right) &= \sum_n \left( \sum_{j=0}^n a_jb_{n-j} \right) x^n \\ \left( \sum_n a_n x^n \right)' &= \sum_n na_n x^{n-1} \end{aligned}

and in what follows I’ll start to use ${\sum_n a_nx^n}$ more. But officially, all definitions for formal series are in terms of the coefficients alone; these presence of ${x}$ serves as motivation only.

Exercise 6

Show that if ${F = \sum_n a_nx^n}$ is a formal series, then it has a multiplicative inverse if and only if ${a_0 \neq 0}$.

On the other hand, with functional series, the above operations are even simpler:

Definition 7

Let ${G_1 : U \rightarrow \mathbb C}$ and ${G_2 : U \rightarrow \mathbb C}$ be functional series with the same domain ${U}$. Then ${G_1 \pm G_2}$ and ${G_1 \cdot G_2}$ are defined pointwise.

If ${G : U \rightarrow \mathbb C}$ is a functional series (hence holomorphic), then ${G'}$ is defined poinwise.

If ${G}$ is nonvanishing on ${U}$, then ${1/G : U \rightarrow \mathbb C}$ is defined pointwise (and otherwise is not defined).

Now, for these finite operations, everything works as you expect:

Theorem 8 (Compatibility of finite operations)

Suppose ${F}$, ${F_1}$, ${F_2}$ are formal series, and ${G}$, ${G_1}$, ${G_2}$ are functional series ${U \rightarrow \mathbb C}$. Assume ${F \simeq G}$, ${F_1 \simeq G_1}$, ${F_2 \simeq G_2}$.

• ${F_1 \pm F_2 \simeq G_1 \pm G_2}$, ${F_1 \cdot F_2 = G_1 \cdot G_2}$.
• ${F' \simeq G'}$.
• If ${1/G}$ is defined, then ${1/F}$ is defined and ${1/F \simeq 1/G}$.

So far so good: as long as we’re doing finite operations. But once we step beyond that, things begin to go haywire.

## 4. Limits

We need to start considering limits of ${(F_k)_k}$ and ${(G_k)_k}$, since we are trying to make progress towards infinite sums and products. Once we do this, things start to burn.

Definition 9

Let ${F_1 = \sum_n a_n x^n}$ and ${F_2 = \sum_n b_n x^n}$ be formal series, and define the difference by

$\displaystyle d(F_1, F_2) = \begin{cases} 2^{-n} & a_n \neq b_n, \; n \text{ minimal} \\ 0 & F_1 = F_2. \end{cases}$

This function makes ${\mathbb C[[x]]}$ into a metric space, so we can discuss limits in this space. Actually, it is a normed vector space obtained by ${\left\lVert F \right\rVert = d(F,0)}$ above.

Thus, ${\lim_{k \rightarrow \infty} F_k = F}$ if each coefficient of ${x^n}$ eventually stabilizes as ${k \rightarrow \infty}$. For example, as formal series we have that ${(1,-1,0,0,\dots)}$, ${(1,0,-1,0,\dots)}$, ${(1,0,0,-1,\dots)}$ converges to ${1 = (1,0,0,0\dots)}$, which we write as

$\displaystyle \lim_{k \rightarrow \infty} (1 - x^k) = 1 \qquad \text{as formal series}.$

As for functional series, since they are functions on the same open set ${U}$, we can use pointwise convergence or the stronger uniform convergence; we’ll say explicitly which one we’re doing.

Example 10 (Limits don’t work at all)

In what follows, ${F_k \simeq G_k}$ for every ${k}$.

• Here is an example showing that if ${\lim_k F_k = F}$, the functions ${G_k}$ may not converge even pointwise. Indeed, just take ${F_k = 1 - x^k}$ as before, and let ${U = \{ z : |z| < 2 \}}$.
• Here is an example showing that even if ${G_k \rightarrow G}$ uniformly, ${\lim_k F_k}$ may not exist. Take ${G_k = 1 - 1/k}$ as constant functions. Then ${G_k \rightarrow 1}$, but ${\lim_k F_k}$ doesn’t exist because the constant term never stabilizes (in the combinatorial sense).
• The following example from this math.SE answer by Robert Israel shows that it’s possible that ${F = \lim_k F_k}$ exists, and ${G_k \rightarrow G}$ pointwise, and still ${F \not\simeq G}$. Let ${U}$ be the open unit disk, and set

\displaystyle \begin{aligned} A_k &= \{z = r e^{i\theta} \mid 2/k \le r \le 1, \; 0 \le \theta \le 2\pi - 1/k\} \\ B_k &= \left\{ |z| \le 1/k \right\} \end{aligned}

for ${k \ge 1}$. By Runge theorem there’s a polynomial ${p_k(z)}$ such that

$\displaystyle |p_k(z) - 1/z^{k}| < 1/k \text{ on } A_k \qquad \text{and} \qquad |p_k(z)| < 1/k \text{ on }B_k.$

Then

$\displaystyle G_k(z) = z^{k+1} p_k(z)$

is the desired counterexample (with ${F_k}$ being the sequence of coefficients from ${G}$). Indeed by construction ${\lim_k F_k = 0}$, since ${\left\lVert F_k \right\rVert \le 2^{-k}}$ for each ${k}$. Alas, ${|g_k(z) - z| \le 2/k}$ for ${z \in A_k \cup B_k}$, so ${G_k \rightarrow z}$ converges pointwise to the identity function.

To be fair, we do have the following saving grace:

Theorem 11 (Uniform convergence and both limits exist is sufficient)

Suppose that ${G_k \rightarrow G}$ converges uniformly. Then if ${F_k \simeq G_k}$ for every ${k}$, and ${\lim_k F_k = F}$, then ${F \simeq G}$.

Proof: Here is a proof, copied from this math.SE answer by Joey Zhou. WLOG ${G = 0}$, and let ${g_n(z) = \sum{a^{(n)}_kz^k}}$. It suffices to show that ${a_k = 0}$ for all ${k}$. Choose any ${0. By Cauchy’s integral formula, we have

\displaystyle \begin{aligned} \left|a_k - a^{(n)}_k\right| &= \left|\frac{1}{2\pi i} \int\limits_{|z|=r}{\frac{g(z)-g_n(z)}{z^{n+1}}\text{ d}z}\right| \\ & \le\frac{1}{2\pi}(2\pi r)\frac{1}{r^{n+1}}\max\limits_{|z|=r}{|g(z)-g_n(z)|} \xrightarrow{n\rightarrow\infty} 0 \end{aligned}

since ${g_n}$ converges uniformly to ${g}$ on ${U}$. Hence, ${a_k = \lim\limits_{n\rightarrow\infty}{a^{(n)}_k}}$. Since ${a^{(n)}_k = 0}$ for ${n\ge k}$, the result follows. $\Box$

The take-away from this section is that limits are relatively poorly behaved.

## 5. Infinite sums and products

Naturally, infinite sums and products are defined by taking the limit of partial sums and limits. The following example (from math.SE again) shows the nuances of this behavior.

Example 12 (On ${e^{1+x}}$)

The expression

$\displaystyle \sum_{n=0}^\infty \frac{(1+x)^n}{n!} = \lim_{N \rightarrow \infty} \sum_{n=0}^N \frac{(1+x)^n}{n!}$

does not make sense as a formal series: we observe that for every ${N}$ the constant term of the partial sum changes.

But this does converge (uniformly, even) to a functional series on ${U = \mathbb C}$, namely to ${e^{1+x}}$.

Exercise 13

Let ${(F_k)_{k \ge 1}}$ be formal series.

• Show that an infinite sum ${\sum_{k=1}^\infty F_k(x)}$ converges as formal series exactly when ${\lim_k \left\lVert F_k \right\rVert = 0}$.
• Assume for convenience ${F_k(0) = 1}$ for each ${k}$. Show that an infinite product ${\prod_{k=0}^{\infty} (1+F_k)}$ converges as formal series exactly when ${\lim_k \left\lVert F_k-1 \right\rVert = 0}$.

Now the upshot is that one example of a convergent formal sum is the expression ${\lim_{N} \sum_{n=0}^N a_nx^n}$ itself! This means we can use standard “radius of convergence” arguments to transfer a formal series into functional one.

Theorem 14 (Constructing ${G}$ from ${F}$)

Let ${F = \sum a_nx^n}$ be a formal series and let

$\displaystyle r = \frac{1}{\limsup_n \sqrt[n]{|c_n|}}.$

If ${r > 0}$ then there exists a functional series ${G}$ on ${U = \{ |z| < r \}}$ such that ${F \simeq G}$.

Proof: Let ${F_k}$ and ${G_k}$ be the corresponding partial sums of ${c_0x^0}$ to ${c_kx^k}$. Then by Cauchy-Hadamard theorem, we have ${G_k \rightarrow G}$ uniformly on (compact subsets of) ${U}$. Also, ${\lim_k F_k = F}$ by construction. $\Box$

This works less well with products: for example we have

$\displaystyle 1 \equiv (1-x) \prod_{j \ge 0} (1+x^{2^j})$

as formal series, but we can’t “plug in ${x=1}$”, for example,

## 6. Finishing the original problem

We finally return to the original problem: we wish to show that the equality

$\displaystyle P(x) = \prod_{j=1}^\infty (1 - x^{s_j})$

cannot hold as formal series. We know that tacitly, this just means

$\displaystyle \lim_{N \rightarrow \infty} \prod_{j=1}^N\left( 1 - x^{s_j} \right) = P(x)$

as formal series.

Here is a solution obtained only by only considering coefficients, presented by Qiaochu Yuan from this MathOverflow question.

Both sides have constant coefficient ${1}$, so we may invert them; thus it suffices to show we cannot have

$\displaystyle \frac{1}{P(x)} = \frac{1}{\prod_{j=1}^{\infty} (1 - x^{s_j})}$

as formal power series.

The coefficients on the LHS have asymptotic growth a polynomial times an exponential.

On the other hand, the coefficients of the RHS can be shown to have growth both strictly larger than any polynomial (by truncating the product) and strictly smaller than any exponential (by comparing to the growth rate in the case where ${s_j = j}$, which gives the partition function ${p(n)}$ mentioned before). So the two rates of growth can’t match.

# New algebra handouts on my website

For olympiad students: I have now published some new algebra handouts. They are:

• Introduction to Functional Equations, which cover the basic techniques and theory for FE’s typically appearing on olympiads like USA(J)MO.
• Monsters, an advanced handout which covers functional equations that have pathological solutions. It covers in detail the solutions to Cauchy functional equation.
• Summation, which is a compilation of various types of olympiad-style sums like generating functions and multiplicative number theory.

• English, notes on proof-writing that I used at the 2016 MOP (Mathematical Olympiad Summer Program).

You can download all these (and other handouts) from my MIT website. Enjoy!

# Approximating E3-LIN is NP-Hard

This lecture, which I gave for my 18.434 seminar, focuses on the MAX-E3LIN problem. We prove that approximating it is NP-hard by a reduction from LABEL-COVER.

## 1. Introducing MAX-E3LIN

In the MAX-E3LIN problem, our input is a series of linear equations ${\pmod 2}$ in ${n}$ binary variables, each with three terms. Equivalently, one can think of this as ${\pm 1}$ variables and ternary products. The objective is to maximize the fraction of satisfied equations.

Example 1 (Example of MAX-E3LIN instance)

\displaystyle \begin{aligned} x_1 + x_3 + x_4 &\equiv 1 \pmod 2 \\ x_1 + x_2 + x_4 &\equiv 0 \pmod 2 \\ x_1 + x_2 + x_5 &\equiv 1 \pmod 2 \\ x_1 + x_3 + x_5 &\equiv 1 \pmod 2 \end{aligned}

\displaystyle \begin{aligned} x_1 x_3 x_4 &= -1 \\ x_1 x_2 x_4 &= +1 \\ x_1 x_2 x_5 &= -1 \\ x_1 x_3 x_5 &= -1 \end{aligned}

A diligent reader can check that we may obtain ${\frac34}$ but not ${1}$.

Remark 2

We immediately notice that

• If there’s a solution with value ${1}$, we can find it easily with ${\mathbb F_2}$ linear algebra.
• It is always possible to get at least ${\frac{1}{2}}$ by selecting all-zero or all-one.

The theorem we will prove today is that these “obvious” observations are essentially the best ones possible! Our main result is that improving the above constants to 51% and 99%, say, is NP-hard.

Theorem 3 (Hardness of MAX-E3LIN)

The ${\frac{1}{2}+\varepsilon}$ vs. ${1-\delta}$ decision problem for MAX-E3LIN is NP-hard.

This means it is NP-hard to decide whether an MAX-E3LIN instance has value ${\le \frac{1}{2}+\varepsilon}$ or ${\ge 1-\delta}$ (given it is one or the other). A direct corollary of this is approximating MAX-SAT is also NP-hard.

Corollary 4

The ${\frac78+\varepsilon}$ vs. ${1-\delta}$ decision problem for MAX-SAT is NP-hard.

Remark 5

The constant ${\frac78}$ is optimal in light of a random assignment. In fact, one can replace ${1-\delta}$ with ${\delta}$, but we don’t do so here.

Proof: Given an equation ${a+b+c=1}$ in MAX-E3LIN, we consider four formulas ${a \lor \neg b \lor \neg c}$, ${\neg a \lor b \lor \neg c}$, ${a \lor \neg b \lor \neg c}$, ${a \lor b \lor c}$. Either three or four of them are satisfied, with four occurring exactly when ${a+b+c=0}$. One does a similar construction for ${a+b+c=1}$. $\Box$

The hardness of MAX-E3LIN is relevant to the PCP theorem: using MAX-E3LIN gadgets, Ha}stad was able to prove a very strong version of the PCP theorem, in which the verifier merely reads just three bits of a proof!

Let ${\varepsilon, \delta > 0}$. We have

$\displaystyle \mathbf{NP} \subseteq \mathbf{PCP}_{\frac{1}{2}+\varepsilon, 1-\delta}(3, O(\log n)).$

In other words, any ${L \in \mathbf{NP}}$ has a (non-adaptive) verifier with the following properties.

• The verifier uses ${O(\log n)}$ random bits, and queries just three (!) bits.
• The acceptance condition is either ${a+b+c=1}$ or ${a+b+c=0}$.
• If ${x \in L}$, then there is a proof ${\Pi}$ which is accepted with probability at least ${1-\delta}$.
• If ${x \notin L}$, then every proof is accepted with probability at most ${\frac{1}{2} + \varepsilon}$.

## 2. Label Cover

We will prove our main result by reducing from the LABEL-COVER. Recall LABEL-COVER is played as follows: we have a bipartite graph ${G = U \cup V}$, a set of keys ${K}$ for vertices of ${U}$ and a set of labels ${L}$ for ${V}$. For every edge ${e = \{u,v\}}$ there is a function ${\pi_e : L \rightarrow K}$ specifying a key ${k = \pi_e(\ell) \in K}$ for every label ${\ell \in L}$. The goal is to label the graph ${G}$ while maximizing the number of edges ${e}$ with compatible key-label pairs.

Approximating LABEL-COVER is NP-hard:

Theorem 7 (Hardness of LABEL-COVER)

The ${\eta}$ vs. ${1}$ decision problem for LABEL-COVER is NP-hard for every ${\eta > 0}$, given ${|K|}$ and ${|L|}$ are sufficiently large in ${\eta}$.

So for any ${\eta > 0}$, it is NP-hard to decide whether one can satisfy all edges or fewer than ${\eta}$ of them.

## 3. Setup

We are going to make a reduction of the following shape:

In words this means that

• “Completeness”: If the LABEL-COVER instance is completely satisfiable, then we get a solution of value ${\ge 1 - \delta}$ in the resulting MAX-E3LIN.
• “Soundness”: If the LABEL-COVER instance has value ${\le \eta}$, then we get a solution of value ${\le \frac{1}{2} + \varepsilon}$ in the resulting MAX-E3LIN.

Thus given an oracle for MAX-E3LIN decision, we can obtain ${\eta}$ vs. ${1}$ decision for LABEL-COVER, which we know is hard.

The setup for this is quite involved, using a huge number of variables. Just to agree on some conventions:

Definition 8 (“Long Code”)

A ${K}$-indexed binary string ${x = (x_k)_k}$ is a ${\pm 1}$ sequence indexed by ${K}$. We can think of it as an element of ${\{\pm 1\}^K}$. An ${L}$-binary string ${y = (y_\ell)_\ell}$ is defined similarly.

Now we initialize ${|U| \cdot 2^{|K|} + |V| \cdot 2^{|L|}}$ variables:

• At every vertex ${u \in U}$, we will create ${2^{|K|}}$ binary variables, one for every ${K}$-indexed binary string. It is better to collect these variables into a function

$\displaystyle f_u : \{\pm1\}^K \rightarrow \{\pm1\}.$

• Similarly, at every vertex ${v \in V}$, we will create ${2^{|L|}}$ binary variables, one for every ${L}$-indexed binary string, and collect these into a function

$\displaystyle g_v : \{\pm1\}^L \rightarrow \{\pm1\}.$

Picture:

Next we generate the equations. Here’s the motivation: we want to do this in such a way that given a satisfying labelling for LABEL-COVER, nearly all the MAX-E3LIN equations can be satisfied. One idea is as follows: for every edge ${e}$, letting ${\pi = \pi_e}$,

• Take a ${K}$-indexed binary string ${x = (x_k)_k}$ at random. Take an ${L}$-indexed binary string ${y = (y_\ell)_\ell}$ at random.
• Define the ${L}$-indexed binary ${z = (z_\ell)_\ell}$ string by ${z = \left( x_{\pi(\ell)} y_\ell \right)}$.
• Write down the equation ${f_u(x) g_v(y) g_v(z) = +1}$ for the MAX-E3LIN instance.

Thus, assuming we had a valid coloring of the graph, we could let ${f_u}$ and ${g_v}$ be the dictator functions for the colorings. In that case, ${f_u(x) = x_{\pi(\ell)}}$, ${g_v(y) = y_\ell}$, and ${g_v(z) = x_{\pi(\ell)} y_\ell}$, so the product is always ${+1}$.

Unfortunately, this has two fatal flaws:

1. This means a ${1}$ instance of LABEL-COVER gives a ${1}$ instance of MAX-E3LIN, but we need ${1-\delta}$ to have a hope of working.
2. Right now we could also just set all variables to be ${+1}$.

We fix this as follows, by using the following equations.

Definition 8 (Equations of reduction)

For every edge ${e}$, with ${\pi = \pi_e}$, we alter the construction and say

• Let ${x = (x_k)_k}$ be and ${y = (y_\ell)_\ell}$ be random as before.
• Let ${n = (n_\ell)_\ell}$ be a random ${L}$-indexed binary string, drawn from a ${\delta}$-biased distribution (${-1}$ with probability ${\delta}$). And now define ${z = (z_\ell)_\ell}$ by

$\displaystyle z_\ell = x_{\pi(\ell)} y_\ell n_\ell .$

The ${n_\ell}$ represents “noise” bits, which resolve the first problem by corrupting a bit of ${z}$ with probability ${\delta}$.

• Write down one of the following two equations with ${\frac{1}{2}}$ probability each:

\displaystyle \begin{aligned} f_u(x) g_v(y) g_v(z) &= +1 \\ f_u(x) g_v(y) g_v(-z) &= -1. \end{aligned}

This resolves the second issue.

This gives a set of ${O(|E|)}$ equations.

I claim this reduction works. So we need to prove the “completeness” and “soundness” claims above.

## 4. Proof of Completeness

Given a labeling of ${G}$ with value ${1}$, as described we simply let ${f_u}$ and ${g_v}$ be dictator functions corresponding to this valid labelling. Then as we’ve seen, we will pass ${1 - \delta}$ of the equations.

## 5. A Fourier Computation

Before proving soundness, we will first need to explicitly compute the probability an equation above is satisfied. Remember we generated an equation for ${e}$ based on random strings ${x}$, ${y}$, ${\lambda}$.

For ${T \subseteq L}$, we define

$\displaystyle \pi^{\text{odd}}_e(T) = \left\{ k \in K \mid \left\lvert \pi_e^{-1}(k) \cap T \right\rvert \text{ is odd} \right\}.$

Thus ${T}$ maps subsets of ${L}$ to subsets of ${K}$.

Remark 9

Note that ${|\pi^{\text{odd}}(T)| \le |T|}$ and that ${\pi^{\text{odd}}(T) \neq \varnothing}$ if ${|T|}$ is odd.

Lemma 10 (Edge Probability)

The probability that an equation generated for ${e = \{u,v\}}$ is true is

$\displaystyle \frac{1}{2} + \frac{1}{2} \sum_{\substack{T \subseteq L \\ |T| \text{ odd}}} (1-2\delta)^{|T|} \widehat g_v(T)^2 \widehat f_u(\pi^{\text{odd}}_e(T)).$

Proof: Omitted for now\dots $\Box$

## 6. Proof of Soundness

We will go in the reverse direction and show (constructively) that if there is MAX-E3LIN instance has a solution with value ${\ge\frac{1}{2}+2\varepsilon}$, then we can reconstruct a solution to LABEL-COVER with value ${\ge \eta}$. (The use of ${2\varepsilon}$ here will be clear in a moment). This process is called “decoding”.

The idea is as follows: if ${S}$ is a small set such that ${\widehat f_u(S)}$ is large, then we can pick a key from ${S}$ at random for ${f_u}$; compare this with the dictator functions where ${\widehat f_u(S) = 1}$ and ${|S| = 1}$. We want to do something similar with ${T}$.

Here are the concrete details. Let ${\Lambda = \frac{\log(1/\varepsilon)}{2\delta}}$ and ${\eta = \frac{\varepsilon^3}{\Lambda^2}}$ be constants (the actual values arise later).

Definition 11

We say that a nonempty set ${S \subseteq K}$ of keys is heavy for ${u}$ if

$\displaystyle \left\lvert S \right\rvert \le \Lambda \qquad\text{and}\qquad \widehat{f_u}(S) \ge \varepsilon^2.$

Note that there are at most ${\varepsilon^{-2}}$ heavy sets by Parseval.

Definition 12

We say that a nonempty set ${T \subseteq L}$ of labels is ${e}$-excellent for ${v}$ if

$\displaystyle \left\lvert T \right\rvert \le \Lambda \qquad\text{and}\qquad S = \pi^{\text{odd}}_e(T) \text{ is heavy.}$

In particular ${S \neq \varnothing}$ so at least one compatible key-label pair is in ${S \times T}$.

Notice that, unlike the case with ${S}$, the criteria for “good” in ${T}$ actually depends on the edge ${e}$ in question! This makes it easier to keys than to select labels. In order to pick labels, we will have to choose from a ${\widehat g_v^2}$ distribution.

Lemma 13 (At least ${\varepsilon}$ of ${T}$ are excellent)

For any edge ${e = \{u,v\}}$, at least ${\varepsilon}$ of the possible ${T}$ according to the distribution ${\widehat g_v^2}$ are ${e}$-excellent.

Proof: Applying an averaging argument to the inequality

$\displaystyle \sum_{\substack{T \subseteq L \\ |T| \text{ odd}}} (1-2\delta)^{|T|} \widehat g_v(T)^2 \left\lvert \widehat f_u(\pi^{\text{odd}}(T)) \right\rvert \ge 2\varepsilon$

shows there is at least ${\varepsilon}$ chance that ${|T|}$ is odd and satisfies

$\displaystyle (1-2\delta)^{|T|} \left\lvert \widehat f_u(S) \right\rvert \ge \varepsilon$

where ${S = \pi^{\text{odd}}_e(T)}$. In particular, ${(1-2\delta)^{|T|} \ge \varepsilon \iff |T| \le \Lambda}$. Finally by \Cref{rem:po}, we see ${S}$ is heavy. $\Box$

Now, use the following algorithm.

• For every vertex ${u \in U}$, take the union of all heavy sets, say

$\displaystyle \mathcal H = \bigcup_{S \text{ heavy}} S.$

Pick a random key from ${\mathcal H}$. Note that ${|\mathcal H| \le \Lambda\varepsilon^{-2}}$, since there are at most ${\varepsilon^{-2}}$ heavy sets (by Parseval) and each has at most ${\Lambda}$ elements.

• For every vertex ${v \in V}$, select a random set ${T}$ according to the distribution ${\widehat g_v(T)^2}$, and select a random element from ${T}$.

I claim that this works.

Fix an edge ${e}$. There is at least an ${\varepsilon}$ chance that ${T}$ is ${e}$-excellent. If it is, then there is at least one compatible pair in ${\mathcal H \times T}$. Hence we conclude probability of success is at least

$\displaystyle \varepsilon \cdot \frac{1}{\Lambda \varepsilon^{-2}} \cdot \frac{1}{\Lambda} = \frac{\varepsilon^3}{\Lambda^2} = \eta.$

(Addendum: it’s pointed out to me this isn’t quite right; the overall probability of the equation given by an edge ${e}$ is ${\ge \frac{1}{2}+\varepsilon}$, but this doesn’t imply it for every edge. Thus one likely needs to do another averaging argument.)