Joyal’s Proof of Cayley’s Tree Formula

I wanted to quickly write this proof up, complete with pictures, so that I won’t forget it again. In this post I’ll give a combinatorial proof (due to Joyal) of the following:

Theorem 1 (Cayley’s Formula)

The number of trees on {n} labelled vertices is {n^{n-2}}.

Proof: We are going to construct a bijection between

  • Functions {\{1, 2, \dots, n\} \rightarrow \{1, 2, \dots, n\}} (of which there are {n^n}) and
  • Trees on {\{1, 2, \dots, n\}} with two distinguished nodes {A} and {B} (possibly {A=B}).

This will imply the answer.

Let’s look at the first piece of data. We can visualize it as {n} points floating around, each with an arrow going out of it pointing to another point, but possibly with many other arrows coming into it. Such a structure is apparently called a directed pseudoforest. Here is an example when {n = 9}.


You’ll notice that in each component, some of the points lie in a cycle and others do not. I’ve colored the former type of points blue, and the corresponding arrows magenta.

Thus a directed pseudoforest can also be specified by

  • a choice of some vertices to be in cycles (blue vertices),
  • a permutation on the blue vertices (magenta arrows), and
  • attachments of trees to the blue vertices (grey vertices and arrows).

Now suppose we take the same information, but replace the permutation on the blue vertices with a total ordering instead (of course there are an equal number of these). Then we can string the blue vertices together as shown below, where the green arrows denote the selected total ordering (in this case {1 < 9 < 2 < 4 < 8 < 5}):


This is exactly the data of a tree on the {n} vertices with two distinguished vertices, the first and last in the chain of green (which could possibly coincide). \Box


Combinatorial Nullstellensatz and List Coloring

More than six months late, but here are notes from the combinatorial nullsetllensatz talk I gave at the student colloquium at MIT. This was also my term paper for 18.434, “Seminar in Theoretical Computer Science”.

1. Introducing the choice number

One of the most fundamental problems in graph theory is that of a graph coloring, in which one assigns a color to every vertex of a graph so that no two adjacent vertices have the same color. The most basic invariant related to the graph coloring is the chromatic number:

Definition 1

A simple graph {G} is {k}-colorable if it’s possible to properly color its vertices with {k} colors. The smallest such {k} is the chromatic number {\chi(G)}.

In this exposition we study a more general notion in which the set of permitted colors is different for each vertex, as long as at least {k} colors are listed at each vertex. This leads to the notion of a so-called choice number, which was introduced by Erdös, Rubin, and Taylor.

Definition 2

A simple graph {G} is {k}-choosable if its possible to properly color its vertices given a list of {k} colors at each vertex. The smallest such {k} is the choice number {\mathop{\mathrm{ch}}(G)}.

Example 3

We have {\mathop{\mathrm{ch}}(C_{2n}) = \chi(C_{2n}) = 2} for any integer {n} (here {C_{2n}} is the cycle graph on {2n} vertices). To see this, we only have to show that given a list of two colors at each vertex of {C_{2n}}, we can select one of them.

  • If the list of colors is the same at each vertex, then since {C_{2n}} is bipartite, we are done.
  • Otherwise, suppose adjacent vertices {v_1}, {v_{2n}} are such that some color at {c} is not in the list at {v_{2n}}. Select {c} at {v_1}, and then greedily color in {v_2}, \dots, {v_{2n}} in that order.

We are thus naturally interested in how the choice number and the chromatic number are related. Of course we always have

\displaystyle \mathop{\mathrm{ch}}(G) \ge \chi(G).

Näively one might expect that we in fact have an equality, since allowing the colors at vertices to be different seems like it should make the graph easier to color. However, the following example shows that this is not the case.

Example 4 (Erdös)

Let {n \ge 1} be an integer and define

\displaystyle G = K_{n^n, n}.

We claim that for any integer {n \ge 1} we have

\displaystyle \mathop{\mathrm{ch}}(G) \ge n+1 \quad\text{and}\quad \chi(G) = 2.

The latter equality follows from {G} being partite.

Now to see the first inequality, let {G} have vertex set {U \cup V}, where {U} is the set of functions {u : [n] \rightarrow [n]} and {V = [n]}. Then consider {n^2} colors {C_{i,j}} for {1 \le i, j \le n}. On a vertex {u \in U}, we list colors {C_{1,u(1)}}, {C_{2,u(2)}}, \dots, {C_{n,u(n)}}. On a vertex {v \in V}, we list colors {C_{v,1}}, {C_{v,2}}, \dots, {C_{v,n}}. By construction it is impossible to properly color {G} with these colors.

The case {n = 3} is illustrated in the figure below (image in public domain).


This surprising behavior is the subject of much research: how can we bound the choice number of a graph as a function of its chromatic number and other properties of the graph? We see that the above example requires exponentially many vertices in {n}.

Theorem 5 (Noel, West, Wu, Zhu)

If {G} is a graph with {n} vertices then

\displaystyle \chi(G) \le \mathop{\mathrm{ch}}(G) \le \max\left( \chi(G), \left\lceil \frac{\chi(G)+n-1}{3} \right\rceil \right).

In particular, if {n \le 2\chi(G)+1} then {\mathop{\mathrm{ch}}(G) = \chi(G)}.

One of the most major open problems in this direction is the following.

Definition 6

A claw-free graph is a graph with no induced {K_{3,1}}. For example, the line graph (also called edge graph) of any simple graph {G} is claw-free.

If {G} is a claw-free graph, then {\mathop{\mathrm{ch}}(G) = \chi(G)}. In particular, this conjecture implies that for edge coloring, the notions of “chromatic number” and “choice number” coincide.


In this exposition, we prove the following result of Alon.

Theorem 7 (Alon)

A bipartite graph {G} is {\left\lfloor L(G) \right\rfloor+1} choosable, where

\displaystyle L(G) \overset{\mathrm{def}}{=} \max_{H \subseteq G} |E(H)|/|V(H)|

is half the maximum of the average degree of subgraphs {H}.

In particular, recall that a planar bipartite graph {H} with {r} vertices contains at most {2r-4} edges. Thus for such graphs we have {L(G) \le 2} and deduce:

Corollary 8

A planar bipartite graph is {3}-choosable.

This corollary is sharp, as it applies to {K_{2,4}} which we have seen in Example 4 has {\mathop{\mathrm{ch}}(K_{2,4}) = 3}.

The rest of the paper is divided as follows. First, we begin in §2 by stating Theorem 9, the famous combinatorial nullstellensatz of Alon. Then in §3 and §4, we provide descriptions of the so-called graph polynomial, to which we then apply combinatorial nullstellensatz to deduce Theorem 18. Finally in §5, we show how to use Theorem 18 to prove Theorem 7.

2. Combinatorial Nullstellensatz

The main tool we use is the Combinatorial Nullestellensatz of Alon.

Theorem 9 (Combinatorial Nullstellensatz)

Let {F} be a field, and let {f \in F[x_1, \dots, x_n]} be a polynomial of degree {t_1 + \dots + t_n}. Let {S_1, S_2, \dots, S_n \subseteq F} such that {\left\lvert S_i \right\rvert > t_i} for all {i}.

Assume the coefficient of {x_1^{t_1}x_2^{t_2}\dots x_n^{t_n}} of {f} is not zero. Then we can pick {s_1 \in S_1}, \dots, {s_n \in S_n} such that

\displaystyle f(s_1, s_2, \dots, s_n) \neq 0.

Example 10

Let us give a second proof that

\displaystyle \mathop{\mathrm{ch}}(C_{2n}) = 2

for every positive integer {n}. Our proof will be an application of the Nullstellensatz.

Regard the colors as real numbers, and let {S_i} be the set of colors at vertex {i} (hence {1 \le i \le 2n}, and {|S_i| = 2}). Consider the polynomial

\displaystyle f = \left( x_1-x_2 \right)\left( x_2-x_3 \right) \dots \left( x_{2n-1}-x_{2n} \right)\left( x_{2n}-x_1 \right)

The coefficient of {x_1^1 x_2^1 \dots x_{2n}^1} is {2 \neq 0}. Therefore, one can select a color from each {S_i} so that {f} does not vanish.

3. The Graph Polynomial, and Directed Orientations

Motivated by Example 10, we wish to apply a similar technique to general graphs {G}. So in what follows, let {G} be a (simple) graph with vertex set {\{1, \dots, n\}}.

Definition 11

The graph polynomial of {G} is defined by

\displaystyle f_G(x_1, \dots, x_n) = \prod_{\substack{(i,j) \in E(G) \\ i < j}} (x_i-x_j).

We observe that coefficients of {f_G} correspond to differences in directed orientations. To be precise, we introduce the notation:

Definition 12

Consider orientations on the graph {G} with vertex set {\{1, \dots, n\}}, meaning we assign a direction {v \rightarrow w} to every edge of {G} to make it into a directed graph {G}. An oriented edge is called ascending if {v \rightarrow w} and {v \le w}, i.e. the edge points from the smaller number to the larger one.

Then we say that an orientation is

  • even if there are an even number of ascending edges, and
  • odd if there are an odd number of ascending edges.

Finally, we define

  • {\mathop{\mathrm{DE}}_G(d_1, \dots, d_n)} to the be set of all even orientations of {G} in which vertex {i} has indegree {d_i}.
  • {\mathop{\mathrm{DO}}_G(d_1, \dots, d_n)} to the be set of all odd orientations of {G} in which vertex {i} has indegree {d_i}.

Set {\mathop{\mathrm{D}}_G(d_1,\dots,d_n) = \mathop{\mathrm{DE}}_G(d_1,\dots,d_n) \cup \mathop{\mathrm{DO}}_G(d_1,\dots,d_n)}.

Example 13

Consider the following orientation:

even-orientationThere are exactly two ascending edges, namely {1 \rightarrow 2} and {2 \rightarrow 4}. The indegrees of are {d_1 = 0}, {d_2 = 2} and {d_3 = d_4 = 1}. Therefore, this particular orientation is an element of {\mathop{\mathrm{DE}}_G(0,2,1,1)}. In terms of {f_G}, this corresponds to the choice of terms

\displaystyle \left( x_1- \boldsymbol{x_2} \right) \left( \boldsymbol{x_2}-x_3 \right) \left( x_2-\boldsymbol{x_4} \right) \left( \boldsymbol{x_3}-x_4 \right)

which is a {+ x_2^2 x_3 x_4} term.

Lemma 14

In the graph polynomial of {G}, the coefficient of {x_1^{d_1} \dots x_n^{d_n}} is

\displaystyle \left\lvert \mathop{\mathrm{DE}}_G(d_1, \dots, d_n) \right\rvert - \left\lvert \mathop{\mathrm{DO}}_G(d_1, \dots, d_n) \right\rvert.

Proof: Consider expanding {f_G}. Then each expanded term corresponds to a choice of {x_i} or {x_j} from each {(i,j)}, as in Example 13. The term has coefficient {+1} is the orientation is even, and {-1} if the orientation is odd, as desired. \Box

Thus we have an explicit combinatorial description of the coefficients in the graph polynomial {f_G}.

4. Coefficients via Eulerian Suborientations

We now give a second description of the coefficients of {f_G}.

Definition 15

Let {D \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)}, viewed as a directed graph. An Eulerian suborientation of {D} is a subgraph of {D} (not necessarily induced) in which every vertex has equal indegree and outdegree. We say that such a suborientation is

  • even if it has an even number of edges, and
  • odd if it has an odd number of edges.

Note that the empty suborientation is allowed. We denote the even and odd Eulerian suborientations of {D} by {\mathop{\mathrm{EE}}(D)} and {\mathop{\mathrm{EO}}(D)}, respectively.

Eulerian suborientations are brought into the picture by the following lemma.

Lemma 16

Assume {D \in \mathop{\mathrm{DE}}_G(d_1, \dots, d_n)}. Then there are natural bijections

\displaystyle \begin{aligned} \mathop{\mathrm{DE}}_G(d_1, \dots, d_n) &\rightarrow \mathop{\mathrm{EE}}(D) \\ \mathop{\mathrm{DO}}_G(d_1, \dots, d_n) &\rightarrow \mathop{\mathrm{EO}}(D). \end{aligned}

Similarly, if {D \in \mathop{\mathrm{DO}}_G(d_1, \dots, d_n)} then there are bijections

\displaystyle \begin{aligned} \mathop{\mathrm{DE}}_G(d_1, \dots, d_n) &\rightarrow \mathop{\mathrm{EO}}(D) \\ \mathop{\mathrm{DO}}_G(d_1, \dots, d_n) &\rightarrow \mathop{\mathrm{EE}}(D). \end{aligned}

Proof: Consider any orientation {D' \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)}, Then we define a suborietation of {D}, denoted {D \rtimes D'}, by including exactly the edges of {D} whose orientation in {D'} is in the opposite direction. It’s easy to see that this induces a bijection

\displaystyle D \rtimes - : \mathop{\mathrm{D}}_G(d_1, \dots, d_n) \rightarrow \mathop{\mathrm{EE}}(D) \cup \mathop{\mathrm{EO}}(D)

Moreover, remark that

  • {D \rtimes D'} is even if {D} and {D'} are either both even or both odd, and
  • {D \rtimes D'} is odd otherwise.

The lemma follows from this. \Box

Corollary 17

In the graph polynomial of {G}, the coefficient of {x_1^{d_1} \dots x_n^{d_n}} is

\displaystyle \pm \left( \left\lvert \mathop{\mathrm{EE}}(D) \right\rvert - \left\lvert \mathop{\mathrm{EO}}(D) \right\rvert \right)

where {D \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)} is arbitrary.

Proof: Combine Lemma 14 and Lemma 16. \Box

We now arrive at the main result:

Theorem 18

Let {G} be a graph on {\{1, \dots, n\}}, and let {D \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)} be an orientation of {G}. If {\left\lvert \mathop{\mathrm{EE}}(D) \right\rvert \neq \left\lvert \mathop{\mathrm{EO}}(D) \right\rvert}, then given a list of {d_i+1} colors at each vertex of {G}, there exists a proper coloring of the vertices of {G}.

In particular, {G} is {(1+\max_i d_i)}-choosable.

Proof: Combine Corollary 17 with Theorem 9. \Box

5. Finding an orientation

Armed with Theorem 18, we are almost ready to prove Theorem 7. The last ingredient is that we need to find an orientation on {G} in which the maximal degree is not too large. This is accomplished by the following.

Lemma 19

Let {L(G) \overset{\mathrm{def}}{=} \max_{H \subseteq G} |E(H)|/|V(H)|} as in Theorem 7. Then {G} has an orientation in which every indegree is at most {\left\lceil L(G) \right\rceil}.

Proof: This is an application of Hall’s marriage theorem.

Let {d = \left\lceil L(G) \right\rceil \ge L(G)}. Construct a bipartite graph

\displaystyle E \cup X \qquad \text{where}\qquad E = E(G) \quad\text{ and }\quad X = \underbrace{V(G) \sqcup \dots \sqcup V(G)}_{d \text{ times}}.

Connect {e \in E} and {v \in X} if {v} is an endpoint of {e}. Since {d \ge L(G)} we satisfy Hall’s condition (as {L(G)} is a condition for all subgraphs {H \subseteq G}) and can match each edge in {E} to a (copy of some) vertex in {X}. Since there are exactly {d} copies of each vertex in {X}, the conclusion follows. \Box

Now we can prove Theorem 7. Proof: According to Lemma 19, pick {D \in \mathop{\mathrm{D}}_G(d_1, \dots, d_n)} where {\max d_i \le \left\lceil L(G) \right\rceil}. Since {G} is bipartite, we obviously have {\mathop{\mathrm{EO}}(D) = \varnothing}, since {G} cannot have any odd cycles. So Theorem 18 applies and we are done. \Box

Formal vs Functional Series (OR: Generating Function Voodoo Magic)

Epistemic status: highly dubious. I found almost no literature doing anything quite like what follows, which unsettles me because it makes it likely that I’m overcomplicating things significantly.

1. Synopsis

Recently I was working on an elegant problem which was the original problem 6 for the 2015 International Math Olympiad, which reads as follows:


[IMO Shortlist 2015 Problem C6] Let {S} be a nonempty set of positive integers. We say that a positive integer {n} is clean if it has a unique representation as a sum of an odd number of distinct elements from {S}. Prove that there exist infinitely many positive integers that are not clean.

Proceeding by contradiction, one can prove (try it!) that in fact all sufficiently large integers have exactly one representation as a sum of an even subset of {S}. Then, the problem reduces to the following:


Show that if {s_1 < s_2 < \dots} is an increasing sequence of positive integers and {P(x)} is a nonzero polynomial then we cannot have

\displaystyle \prod_{j=1}^\infty (1 - x^{s_j}) = P(x)

as formal series.

To see this, note that all sufficiently large {x^N} have coefficient {1 + (-1) = 0}. Now, the intuitive idea is obvious: the root {1} appears with finite multiplicity in {P} so we can put {P(x) = (1-x)^k Q(x)} where {Q(1) \neq 0}, and then we get that {1-x} on the RHS divides {P} too many times, right?

Well, there are some obvious issues with this “proof”: for example, consider the equality

\displaystyle 1 = (1-x)(1+x)(1+x^2)(1+x^4)(1+x^8) \dots.

The right-hand side is “divisible” by {1-x}, but the left-hand side is not (as a polynomial).

But we still want to use the idea of plugging {x \rightarrow 1^-}, so what is the right thing to do? It turns out that this is a complete minefield, and there are a lot of very subtle distinctions that seem to not be explicitly mentioned in many places. I think I have a complete answer now, but it’s long enough to warrant this entire blog post.

Here’s the short version: there’s actually two distinct notions of “generating function”, namely a “formal series” and “functional series”. They use exactly the same notation but are two different types of objects, and this ends up being the source of lots of errors, because “formal series” do not allow substituting {x}, while “functional series” do.

Spoiler: we’ll need the asymptotic for the partition function {p(n)}.

2. Formal Series {\neq} Functional Series

I’m assuming you’ve all heard the definition of {\sum_k c_kx^k}. It turns out unfortunately that this isn’t everything: there are actually two types of objects at play here. They are usually called formal power series and power series, but for this post I will use the more descriptive names formal series and functional series. I’ll do everything over {\mathbb C}, but one can of course use {\mathbb R} instead.

The formal series is easier to describe:

Definition 1

A formal series {F} is an infinite sequence {(a_n)_n = (a_0, a_1, a_2, \dots)} of complex numbers. We often denote it by {\sum a_nx^n = a_0 + a_1x + a_2x^2 + \dots}. The set of formal series is denoted {\mathbb C[ [x] ]}.

This is the “algebraic” viewpoint: it’s a sequence of coefficients. Note that there is no worry about convergence issues or “plugging in {x}”.

On the other hand, a functional series is more involved, because it has to support substitution of values of {x} and worry about convergence issues. So here are the necessary pieces of data:

Definition 2

A functional series {G} (centered at zero) is a function {G : U \rightarrow \mathbb C}, where {U} is an open disk centered at {0} or {U = \mathbb C}. We require that there exists an infinite sequence {(c_0, c_1, c_2, \dots)} of complex numbers satisfying

\displaystyle \forall z \in U: \qquad G(z) = \lim_{N \rightarrow \infty} \left( \sum_{k=0}^N c_k z^k \right).

(The limit is take in the usual metric of {\mathbb C}.) In that case, the {c_i} are unique and called the coefficients of {G}.

This is often written as {G(x) = \sum_n c_n x^n}, with the open set {U} suppressed.

Remark 3

Some remarks on the definition of functional series:

  • This is enough to imply that {G} is holomorphic (and thus analytic) on {U}.
  • For experts: note that I’m including the domain {U} as part of the data required to specify {G}, which makes the presentation cleaner. Most sources do something with “radius of convergence”; I will blissfully ignore this, leaving this data implicitly captured by {U}.
  • For experts: Perhaps non-standard, {U \neq \{0\}}. Otherwise I can’t take derivatives, etc.

Thus formal and functional series, despite having the same notation, have different types: a formal series {F} is a sequence, while a functional series {G} is a function that happens to be expressible as an infinite sum within its domain.

Of course, from every functional series {G} we can extract its coefficients and make them into a formal series {F}. So, for lack of better notation:

Definition 4

If {F = (a_n)_n} is a formal series, and {G : U \rightarrow \mathbb C} is a functional series whose coefficients equal {F}, then we write {F \simeq G}.

3. Finite operations

Now that we have formal and functional series, we can define sums. Since these are different types of objects, we will have to run definitions in parallel and then ideally check that they respect {\simeq}.

For formal series:

Definition 5

Let {F_1 = (a_n)_n} and {F_2 = (b_n)_n} be formal series. Then we set

\displaystyle \begin{aligned} (a_n)_n \pm (b_n)_n &= (a_n \pm b_n)_n \\ (a_n)_n \cdot (b_n)_n &= \left( \textstyle\sum_{j=0}^n a_jb_{n-j} \right)_n. \end{aligned}

This makes {\mathbb C[ [x] ]} into a ring, with identity {(0,0,0,\dots)} and {(1,0,0,\dots)}.

We also define the derivative {F = (a_n)_n} by {F' = ((n+1)a_{n+1})_n}.

It’s probably more intuitive to write these definitions as

\displaystyle \begin{aligned} \sum_n a_n x^n \pm \sum_n b_n x^n &= \sum_n (a_n \pm b_n) x^n \\ \left( \sum_n a_n x^n \right) \left( \sum_n b_n x^n \right) &= \sum_n \left( \sum_{j=0}^n a_jb_{n-j} \right) x^n \\ \left( \sum_n a_n x^n \right)' &= \sum_n na_n x^{n-1} \end{aligned}

and in what follows I’ll start to use {\sum_n a_nx^n} more. But officially, all definitions for formal series are in terms of the coefficients alone; these presence of {x} serves as motivation only.

Exercise 6

Show that if {F = \sum_n a_nx^n} is a formal series, then it has a multiplicative inverse if and only if {a_0 \neq 0}.

On the other hand, with functional series, the above operations are even simpler:

Definition 7

Let {G_1 : U \rightarrow \mathbb C} and {G_2 : U \rightarrow \mathbb C} be functional series with the same domain {U}. Then {G_1 \pm G_2} and {G_1 \cdot G_2} are defined pointwise.

If {G : U \rightarrow \mathbb C} is a functional series (hence holomorphic), then {G'} is defined poinwise.

If {G} is nonvanishing on {U}, then {1/G : U \rightarrow \mathbb C} is defined pointwise (and otherwise is not defined).

Now, for these finite operations, everything works as you expect:

Theorem 8 (Compatibility of finite operations)

Suppose {F}, {F_1}, {F_2} are formal series, and {G}, {G_1}, {G_2} are functional series {U \rightarrow \mathbb C}. Assume {F \simeq G}, {F_1 \simeq G_1}, {F_2 \simeq G_2}.

  • {F_1 \pm F_2 \simeq G_1 \pm G_2}, {F_1 \cdot F_2 = G_1 \cdot G_2}.
  • {F' \simeq G'}.
  • If {1/G} is defined, then {1/F} is defined and {1/F \simeq 1/G}.

So far so good: as long as we’re doing finite operations. But once we step beyond that, things begin to go haywire.

4. Limits

We need to start considering limits of {(F_k)_k} and {(G_k)_k}, since we are trying to make progress towards infinite sums and products. Once we do this, things start to burn.

Definition 9

Let {F_1 = \sum_n a_n x^n} and {F_2 = \sum_n b_n x^n} be formal series, and define the difference by

\displaystyle d(F_1, F_2) = \begin{cases} 2^{-n} & a_n \neq b_n, \; n \text{ minimal} \\ 0 & F_1 = F_2. \end{cases}

This function makes {\mathbb C[[x]]} into a metric space, so we can discuss limits in this space. Actually, it is a normed vector space obtained by {\left\lVert F \right\rVert = d(F,0)} above.

Thus, {\lim_{k \rightarrow \infty} F_k = F} if each coefficient of {x^n} eventually stabilizes as {k \rightarrow \infty}. For example, as formal series we have that {(1,-1,0,0,\dots)}, {(1,0,-1,0,\dots)}, {(1,0,0,-1,\dots)} converges to {1 = (1,0,0,0\dots)}, which we write as

\displaystyle \lim_{k \rightarrow \infty} (1 - x^k) = 1 \qquad \text{as formal series}.

As for functional series, since they are functions on the same open set {U}, we can use pointwise convergence or the stronger uniform convergence; we’ll say explicitly which one we’re doing.

Example 10 (Limits don’t work at all)

In what follows, {F_k \simeq G_k} for every {k}.

  • Here is an example showing that if {\lim_k F_k = F}, the functions {G_k} may not converge even pointwise. Indeed, just take {F_k = 1 - x^k} as before, and let {U = \{ z : |z| < 2 \}}.
  • Here is an example showing that even if {G_k \rightarrow G} uniformly, {\lim_k F_k} may not exist. Take {G_k = 1 - 1/k} as constant functions. Then {G_k \rightarrow 1}, but {\lim_k F_k} doesn’t exist because the constant term never stabilizes (in the combinatorial sense).
  • The following example from this math.SE answer by Robert Israel shows that it’s possible that {F = \lim_k F_k} exists, and {G_k \rightarrow G} pointwise, and still {F \not\simeq G}. Let {U} be the open unit disk, and set

    \displaystyle \begin{aligned} A_k &= \{z = r e^{i\theta} \mid 2/k \le r \le 1, \; 0 \le \theta \le 2\pi - 1/k\} \\ B_k &= \left\{ |z| \le 1/k \right\} \end{aligned}

    for {k \ge 1}. By Runge theorem there’s a polynomial {p_k(z)} such that

    \displaystyle |p_k(z) - 1/z^{k}| < 1/k \text{ on } A_k \qquad \text{and} \qquad |p_k(z)| < 1/k \text{ on }B_k.


    \displaystyle G_k(z) = z^{k+1} p_k(z)

    is the desired counterexample (with {F_k} being the sequence of coefficients from {G}). Indeed by construction {\lim_k F_k = 0}, since {\left\lVert F_k \right\rVert \le 2^{-k}} for each {k}. Alas, {|g_k(z) - z| \le 2/k} for {z \in A_k \cup B_k}, so {G_k \rightarrow z} converges pointwise to the identity function.

To be fair, we do have the following saving grace:

Theorem 11 (Uniform convergence and both limits exist is sufficient)

Suppose that {G_k \rightarrow G} converges uniformly. Then if {F_k \simeq G_k} for every {k}, and {\lim_k F_k = F}, then {F \simeq G}.

Proof: Here is a proof, copied from this math.SE answer by Joey Zhou. WLOG {G = 0}, and let {g_n(z) = \sum{a^{(n)}_kz^k}}. It suffices to show that {a_k = 0} for all {k}. Choose any {0<r<1}. By Cauchy’s integral formula, we have

\displaystyle \begin{aligned} \left|a_k - a^{(n)}_k\right| &= \left|\frac{1}{2\pi i} \int\limits_{|z|=r}{\frac{g(z)-g_n(z)}{z^{n+1}}\text{ d}z}\right| \\ & \le\frac{1}{2\pi}(2\pi r)\frac{1}{r^{n+1}}\max\limits_{|z|=r}{|g(z)-g_n(z)|} \xrightarrow{n\rightarrow\infty} 0 \end{aligned}

since {g_n} converges uniformly to {g} on {U}. Hence, {a_k = \lim\limits_{n\rightarrow\infty}{a^{(n)}_k}}. Since {a^{(n)}_k = 0} for {n\ge k}, the result follows. \Box

The take-away from this section is that limits are relatively poorly behaved.

5. Infinite sums and products

Naturally, infinite sums and products are defined by taking the limit of partial sums and limits. The following example (from math.SE again) shows the nuances of this behavior.

Example 12 (On {e^{1+x}})

The expression

\displaystyle \sum_{n=0}^\infty \frac{(1+x)^n}{n!} = \lim_{N \rightarrow \infty} \sum_{n=0}^N \frac{(1+x)^n}{n!}

does not make sense as a formal series: we observe that for every {N} the constant term of the partial sum changes.

But this does converge (uniformly, even) to a functional series on {U = \mathbb C}, namely to {e^{1+x}}.

Exercise 13

Let {(F_k)_{k \ge 1}} be formal series.

  • Show that an infinite sum {\sum_{k=1}^\infty F_k(x)} converges as formal series exactly when {\lim_k \left\lVert F_k \right\rVert = 0}.
  • Assume for convenience {F_k(0) = 1} for each {k}. Show that an infinite product {\prod_{k=0}^{\infty} (1+F_k)} converges as formal series exactly when {\lim_k \left\lVert F_k-1 \right\rVert = 0}.

Now the upshot is that one example of a convergent formal sum is the expression {\lim_{N} \sum_{n=0}^N a_nx^n} itself! This means we can use standard “radius of convergence” arguments to transfer a formal series into functional one.

Theorem 14 (Constructing {G} from {F})

Let {F = \sum a_nx^n} be a formal series and let

\displaystyle r = \frac{1}{\limsup_n \sqrt[n]{|c_n|}}.

If {r > 0} then there exists a functional series {G} on {U = \{ |z| < r \}} such that {F \simeq G}.

Proof: Let {F_k} and {G_k} be the corresponding partial sums of {c_0x^0} to {c_kx^k}. Then by Cauchy-Hadamard theorem, we have {G_k \rightarrow G} uniformly on (compact subsets of) {U}. Also, {\lim_k F_k = F} by construction. \Box

This works less well with products: for example we have

\displaystyle 1 \equiv (1-x) \prod_{j \ge 0} (1+x^{2^j})

as formal series, but we can’t “plug in {x=1}”, for example,

6. Finishing the original problem

We finally return to the original problem: we wish to show that the equality

\displaystyle P(x) = \prod_{j=1}^\infty (1 - x^{s_j})

cannot hold as formal series. We know that tacitly, this just means

\displaystyle \lim_{N \rightarrow \infty} \prod_{j=1}^N\left( 1 - x^{s_j} \right) = P(x)

as formal series.

Here is a solution obtained only by only considering coefficients, presented by Qiaochu Yuan from this MathOverflow question.

Both sides have constant coefficient {1}, so we may invert them; thus it suffices to show we cannot have

\displaystyle \frac{1}{P(x)} = \frac{1}{\prod_{j=1}^{\infty} (1 - x^{s_j})}

as formal power series.

The coefficients on the LHS have asymptotic growth a polynomial times an exponential.

On the other hand, the coefficients of the RHS can be shown to have growth both strictly larger than any polynomial (by truncating the product) and strictly smaller than any exponential (by comparing to the growth rate in the case where {s_j = j}, which gives the partition function {p(n)} mentioned before). So the two rates of growth can’t match.

Approximating E3-LIN is NP-Hard

This lecture, which I gave for my 18.434 seminar, focuses on the MAX-E3LIN problem. We prove that approximating it is NP-hard by a reduction from LABEL-COVER.

1. Introducing MAX-E3LIN

In the MAX-E3LIN problem, our input is a series of linear equations {\pmod 2} in {n} binary variables, each with three terms. Equivalently, one can think of this as {\pm 1} variables and ternary products. The objective is to maximize the fraction of satisfied equations.

Example 1 (Example of MAX-E3LIN instance)

\displaystyle \begin{aligned} x_1 + x_3 + x_4 &\equiv 1 \pmod 2 \\ x_1 + x_2 + x_4 &\equiv 0 \pmod 2 \\ x_1 + x_2 + x_5 &\equiv 1 \pmod 2 \\ x_1 + x_3 + x_5 &\equiv 1 \pmod 2 \end{aligned}

\displaystyle \begin{aligned} x_1 x_3 x_4 &= -1 \\ x_1 x_2 x_4 &= +1 \\ x_1 x_2 x_5 &= -1 \\ x_1 x_3 x_5 &= -1 \end{aligned}

A diligent reader can check that we may obtain {\frac34} but not {1}.

Remark 2

We immediately notice that

  • If there’s a solution with value {1}, we can find it easily with {\mathbb F_2} linear algebra.
  • It is always possible to get at least {\frac{1}{2}} by selecting all-zero or all-one.

The theorem we will prove today is that these “obvious” observations are essentially the best ones possible! Our main result is that improving the above constants to 51% and 99%, say, is NP-hard.

Theorem 3 (Hardness of MAX-E3LIN)

The {\frac{1}{2}+\varepsilon} vs. {1-\delta} decision problem for MAX-E3LIN is NP-hard.

This means it is NP-hard to decide whether an MAX-E3LIN instance has value {\le \frac{1}{2}+\varepsilon} or {\ge 1-\delta} (given it is one or the other). A direct corollary of this is approximating MAX-SAT is also NP-hard.

Corollary 4

The {\frac78+\varepsilon} vs. {1-\delta} decision problem for MAX-SAT is NP-hard.

Remark 5

The constant {\frac78} is optimal in light of a random assignment. In fact, one can replace {1-\delta} with {\delta}, but we don’t do so here.

Proof: Given an equation {a+b+c=1} in MAX-E3LIN, we consider four formulas {a \lor \neg b \lor \neg c}, {\neg a \lor b \lor \neg c}, {a \lor \neg b \lor \neg c}, {a \lor b \lor c}. Either three or four of them are satisfied, with four occurring exactly when {a+b+c=0}. One does a similar construction for {a+b+c=1}. \Box

The hardness of MAX-E3LIN is relevant to the PCP theorem: using MAX-E3LIN gadgets, Ha}stad was able to prove a very strong version of the PCP theorem, in which the verifier merely reads just three bits of a proof!

Theorem 6 (Hastad PCP)

Let {\varepsilon, \delta > 0}. We have

\displaystyle \mathbf{NP} \subseteq \mathbf{PCP}_{\frac{1}{2}+\varepsilon, 1-\delta}(3, O(\log n)).

In other words, any {L \in \mathbf{NP}} has a (non-adaptive) verifier with the following properties.

  • The verifier uses {O(\log n)} random bits, and queries just three (!) bits.
  • The acceptance condition is either {a+b+c=1} or {a+b+c=0}.
  • If {x \in L}, then there is a proof {\Pi} which is accepted with probability at least {1-\delta}.
  • If {x \notin L}, then every proof is accepted with probability at most {\frac{1}{2} + \varepsilon}.

2. Label Cover

We will prove our main result by reducing from the LABEL-COVER. Recall LABEL-COVER is played as follows: we have a bipartite graph {G = U \cup V}, a set of keys {K} for vertices of {U} and a set of labels {L} for {V}. For every edge {e = \{u,v\}} there is a function {\pi_e : L \rightarrow K} specifying a key {k = \pi_e(\ell) \in K} for every label {\ell \in L}. The goal is to label the graph {G} while maximizing the number of edges {e} with compatible key-label pairs.

Approximating LABEL-COVER is NP-hard:

Theorem 7 (Hardness of LABEL-COVER)

The {\eta} vs. {1} decision problem for LABEL-COVER is NP-hard for every {\eta > 0}, given {|K|} and {|L|} are sufficiently large in {\eta}.

So for any {\eta > 0}, it is NP-hard to decide whether one can satisfy all edges or fewer than {\eta} of them.

3. Setup

We are going to make a reduction of the following shape:


In words this means that

  • “Completeness”: If the LABEL-COVER instance is completely satisfiable, then we get a solution of value {\ge 1 - \delta} in the resulting MAX-E3LIN.
  • “Soundness”: If the LABEL-COVER instance has value {\le \eta}, then we get a solution of value {\le \frac{1}{2} + \varepsilon} in the resulting MAX-E3LIN.

Thus given an oracle for MAX-E3LIN decision, we can obtain {\eta} vs. {1} decision for LABEL-COVER, which we know is hard.

The setup for this is quite involved, using a huge number of variables. Just to agree on some conventions:

Definition 8 (“Long Code”)

A {K}-indexed binary string {x = (x_k)_k} is a {\pm 1} sequence indexed by {K}. We can think of it as an element of {\{\pm 1\}^K}. An {L}-binary string {y = (y_\ell)_\ell} is defined similarly.

Now we initialize {|U| \cdot 2^{|K|} + |V| \cdot 2^{|L|}} variables:

  • At every vertex {u \in U}, we will create {2^{|K|}} binary variables, one for every {K}-indexed binary string. It is better to collect these variables into a function

    \displaystyle f_u : \{\pm1\}^K \rightarrow \{\pm1\}.

  • Similarly, at every vertex {v \in V}, we will create {2^{|L|}} binary variables, one for every {L}-indexed binary string, and collect these into a function

    \displaystyle g_v : \{\pm1\}^L \rightarrow \{\pm1\}.



Next we generate the equations. Here’s the motivation: we want to do this in such a way that given a satisfying labelling for LABEL-COVER, nearly all the MAX-E3LIN equations can be satisfied. One idea is as follows: for every edge {e}, letting {\pi = \pi_e},

  • Take a {K}-indexed binary string {x = (x_k)_k} at random. Take an {L}-indexed binary string {y = (y_\ell)_\ell} at random.
  • Define the {L}-indexed binary {z = (z_\ell)_\ell} string by {z = \left( x_{\pi(\ell)} y_\ell \right)}.
  • Write down the equation {f_u(x) g_v(y) g_v(z) = +1} for the MAX-E3LIN instance.

Thus, assuming we had a valid coloring of the graph, we could let {f_u} and {g_v} be the dictator functions for the colorings. In that case, {f_u(x) = x_{\pi(\ell)}}, {g_v(y) = y_\ell}, and {g_v(z) = x_{\pi(\ell)} y_\ell}, so the product is always {+1}.

Unfortunately, this has two fatal flaws:

  1. This means a {1} instance of LABEL-COVER gives a {1} instance of MAX-E3LIN, but we need {1-\delta} to have a hope of working.
  2. Right now we could also just set all variables to be {+1}.

We fix this as follows, by using the following equations.

Definition 8 (Equations of reduction)

For every edge {e}, with {\pi = \pi_e}, we alter the construction and say

  • Let {x = (x_k)_k} be and {y = (y_\ell)_\ell} be random as before.
  • Let {n = (n_\ell)_\ell} be a random {L}-indexed binary string, drawn from a {\delta}-biased distribution ({-1} with probability {\delta}). And now define {z = (z_\ell)_\ell} by

    \displaystyle z_\ell = x_{\pi(\ell)} y_\ell n_\ell .

    The {n_\ell} represents “noise” bits, which resolve the first problem by corrupting a bit of {z} with probability {\delta}.

  • Write down one of the following two equations with {\frac{1}{2}} probability each:

    \displaystyle \begin{aligned} f_u(x) g_v(y) g_v(z) &= +1 \\ f_u(x) g_v(y) g_v(-z) &= -1. \end{aligned}

    This resolves the second issue.

This gives a set of {O(|E|)} equations.

I claim this reduction works. So we need to prove the “completeness” and “soundness” claims above.

4. Proof of Completeness

Given a labeling of {G} with value {1}, as described we simply let {f_u} and {g_v} be dictator functions corresponding to this valid labelling. Then as we’ve seen, we will pass {1 - \delta} of the equations.

5. A Fourier Computation

Before proving soundness, we will first need to explicitly compute the probability an equation above is satisfied. Remember we generated an equation for {e} based on random strings {x}, {y}, {\lambda}.

For {T \subseteq L}, we define

\displaystyle \pi^{\text{odd}}_e(T) = \left\{ k \in K \mid \left\lvert \pi_e^{-1}(k) \cap T \right\rvert \text{ is odd} \right\}.

Thus {T} maps subsets of {L} to subsets of {K}.

Remark 9

Note that {|\pi^{\text{odd}}(T)| \le |T|} and that {\pi^{\text{odd}}(T) \neq \varnothing} if {|T|} is odd.

Lemma 10 (Edge Probability)

The probability that an equation generated for {e = \{u,v\}} is true is

\displaystyle \frac{1}{2} + \frac{1}{2} \sum_{\substack{T \subseteq L \\ |T| \text{ odd}}} (1-2\delta)^{|T|} \widehat g_v(T)^2 \widehat f_u(\pi^{\text{odd}}_e(T)).

Proof: Omitted for now\dots \Box

6. Proof of Soundness

We will go in the reverse direction and show (constructively) that if there is MAX-E3LIN instance has a solution with value {\ge\frac{1}{2}+2\varepsilon}, then we can reconstruct a solution to LABEL-COVER with value {\ge \eta}. (The use of {2\varepsilon} here will be clear in a moment). This process is called “decoding”.

The idea is as follows: if {S} is a small set such that {\widehat f_u(S)} is large, then we can pick a key from {S} at random for {f_u}; compare this with the dictator functions where {\widehat f_u(S) = 1} and {|S| = 1}. We want to do something similar with {T}.

Here are the concrete details. Let {\Lambda = \frac{\log(1/\varepsilon)}{2\delta}} and {\eta = \frac{\varepsilon^3}{\Lambda^2}} be constants (the actual values arise later).

Definition 11

We say that a nonempty set {S \subseteq K} of keys is heavy for {u} if

\displaystyle \left\lvert S \right\rvert \le \Lambda \qquad\text{and}\qquad \widehat{f_u}(S) \ge \varepsilon^2.

Note that there are at most {\varepsilon^{-2}} heavy sets by Parseval.

Definition 12

We say that a nonempty set {T \subseteq L} of labels is {e}-excellent for {v} if

\displaystyle \left\lvert T \right\rvert \le \Lambda \qquad\text{and}\qquad S = \pi^{\text{odd}}_e(T) \text{ is heavy.}

In particular {S \neq \varnothing} so at least one compatible key-label pair is in {S \times T}.

Notice that, unlike the case with {S}, the criteria for “good” in {T} actually depends on the edge {e} in question! This makes it easier to keys than to select labels. In order to pick labels, we will have to choose from a {\widehat g_v^2} distribution.

Lemma 13 (At least {\varepsilon} of {T} are excellent)

For any edge {e = \{u,v\}}, at least {\varepsilon} of the possible {T} according to the distribution {\widehat g_v^2} are {e}-excellent.

Proof: Applying an averaging argument to the inequality

\displaystyle \sum_{\substack{T \subseteq L \\ |T| \text{ odd}}} (1-2\delta)^{|T|} \widehat g_v(T)^2 \left\lvert \widehat f_u(\pi^{\text{odd}}(T)) \right\rvert \ge 2\varepsilon

shows there is at least {\varepsilon} chance that {|T|} is odd and satisfies

\displaystyle (1-2\delta)^{|T|} \left\lvert \widehat f_u(S) \right\rvert \ge \varepsilon

where {S = \pi^{\text{odd}}_e(T)}. In particular, {(1-2\delta)^{|T|} \ge \varepsilon \iff |T| \le \Lambda}. Finally by \Cref{rem:po}, we see {S} is heavy. \Box

Now, use the following algorithm.

  • For every vertex {u \in U}, take the union of all heavy sets, say

    \displaystyle \mathcal H = \bigcup_{S \text{ heavy}} S.

    Pick a random key from {\mathcal H}. Note that {|\mathcal H| \le \Lambda\varepsilon^{-2}}, since there are at most {\varepsilon^{-2}} heavy sets (by Parseval) and each has at most {\Lambda} elements.

  • For every vertex {v \in V}, select a random set {T} according to the distribution {\widehat g_v(T)^2}, and select a random element from {T}.

I claim that this works.

Fix an edge {e}. There is at least an {\varepsilon} chance that {T} is {e}-excellent. If it is, then there is at least one compatible pair in {\mathcal H \times T}. Hence we conclude probability of success is at least

\displaystyle \varepsilon \cdot \frac{1}{\Lambda \varepsilon^{-2}} \cdot \frac{1}{\Lambda} = \frac{\varepsilon^3}{\Lambda^2} = \eta.

(Addendum: it’s pointed out to me this isn’t quite right; the overall probability of the equation given by an edge {e} is {\ge \frac{1}{2}+\varepsilon}, but this doesn’t imply it for every edge. Thus one likely needs to do another averaging argument.)