Formal vs Functional Series (OR: Generating Function Voodoo Magic)

Epistemic status: highly dubious. I found almost no literature doing anything quite like what follows, which unsettles me because it makes it likely that I’m overcomplicating things significantly.

1. Synopsis

Recently I was working on an elegant problem which was the original problem 6 for the 2015 International Math Olympiad, which reads as follows:


[IMO Shortlist 2015 Problem C6] Let {S} be a nonempty set of positive integers. We say that a positive integer {n} is clean if it has a unique representation as a sum of an odd number of distinct elements from {S}. Prove that there exist infinitely many positive integers that are not clean.

Proceeding by contradiction, one can prove (try it!) that in fact all sufficiently large integers have exactly one representation as a sum of an even subset of {S}. Then, the problem reduces to the following:


Show that if {s_1 < s_2 < \dots} is an increasing sequence of positive integers and {P(x)} is a nonzero polynomial then we cannot have

\displaystyle \prod_{j=1}^\infty (1 - x^{s_j}) = P(x)

as formal series.

To see this, note that all sufficiently large {x^N} have coefficient {1 + (-1) = 0}. Now, the intuitive idea is obvious: the root {1} appears with finite multiplicity in {P} so we can put {P(x) = (1-x)^k Q(x)} where {Q(1) \neq 0}, and then we get that {1-x} on the RHS divides {P} too many times, right?

Well, there are some obvious issues with this “proof”: for example, consider the equality

\displaystyle 1 = (1-x)(1+x)(1+x^2)(1+x^4)(1+x^8) \dots.

The right-hand side is “divisible” by {1-x}, but the left-hand side is not (as a polynomial).

But we still want to use the idea of plugging {x \rightarrow 1^-}, so what is the right thing to do? It turns out that this is a complete minefield, and there are a lot of very subtle distinctions that seem to not be explicitly mentioned in many places. I think I have a complete answer now, but it’s long enough to warrant this entire blog post.

Here’s the short version: there’s actually two distinct notions of “generating function”, namely a “formal series” and “functional series”. They use exactly the same notation but are two different types of objects, and this ends up being the source of lots of errors, because “formal series” do not allow substituting {x}, while “functional series” do.

Spoiler: we’ll need the asymptotic for the partition function {p(n)}.

2. Formal Series {\neq} Functional Series

I’m assuming you’ve all heard the definition of {\sum_k c_kx^k}. It turns out unfortunately that this isn’t everything: there are actually two types of objects at play here. They are usually called formal power series and power series, but for this post I will use the more descriptive names formal series and functional series. I’ll do everything over {\mathbb C}, but one can of course use {\mathbb R} instead.

The formal series is easier to describe:

Definition 1

A formal series {F} is an infinite sequence {(a_n)_n = (a_0, a_1, a_2, \dots)} of complex numbers. We often denote it by {\sum a_nx^n = a_0 + a_1x + a_2x^2 + \dots}. The set of formal series is denoted {\mathbb C[ [x] ]}.

This is the “algebraic” viewpoint: it’s a sequence of coefficients. Note that there is no worry about convergence issues or “plugging in {x}”.

On the other hand, a functional series is more involved, because it has to support substitution of values of {x} and worry about convergence issues. So here are the necessary pieces of data:

Definition 2

A functional series {G} (centered at zero) is a function {G : U \rightarrow \mathbb C}, where {U} is an open disk centered at {0} or {U = \mathbb C}. We require that there exists an infinite sequence {(c_0, c_1, c_2, \dots)} of complex numbers satisfying

\displaystyle \forall z \in U: \qquad G(z) = \lim_{N \rightarrow \infty} \left( \sum_{k=0}^N c_k z^k \right).

(The limit is take in the usual metric of {\mathbb C}.) In that case, the {c_i} are unique and called the coefficients of {G}.

This is often written as {G(x) = \sum_n c_n x^n}, with the open set {U} suppressed.

Remark 3

Some remarks on the definition of functional series:

  • This is enough to imply that {G} is holomorphic (and thus analytic) on {U}.
  • For experts: note that I’m including the domain {U} as part of the data required to specify {G}, which makes the presentation cleaner. Most sources do something with “radius of convergence”; I will blissfully ignore this, leaving this data implicitly captured by {U}.
  • For experts: Perhaps non-standard, {U \neq \{0\}}. Otherwise I can’t take derivatives, etc.

Thus formal and functional series, despite having the same notation, have different types: a formal series {F} is a sequence, while a functional series {G} is a function that happens to be expressible as an infinite sum within its domain.

Of course, from every functional series {G} we can extract its coefficients and make them into a formal series {F}. So, for lack of better notation:

Definition 4

If {F = (a_n)_n} is a formal series, and {G : U \rightarrow \mathbb C} is a functional series whose coefficients equal {F}, then we write {F \simeq G}.

3. Finite operations

Now that we have formal and functional series, we can define sums. Since these are different types of objects, we will have to run definitions in parallel and then ideally check that they respect {\simeq}.

For formal series:

Definition 5

Let {F_1 = (a_n)_n} and {F_2 = (b_n)_n} be formal series. Then we set

\displaystyle \begin{aligned} (a_n)_n \pm (b_n)_n &= (a_n \pm b_n)_n \\ (a_n)_n \cdot (b_n)_n &= \left( \textstyle\sum_{j=0}^n a_jb_{n-j} \right)_n. \end{aligned}

This makes {\mathbb C[ [x] ]} into a ring, with identity {(0,0,0,\dots)} and {(1,0,0,\dots)}.

We also define the derivative {F = (a_n)_n} by {F' = ((n+1)a_{n+1})_n}.

It’s probably more intuitive to write these definitions as

\displaystyle \begin{aligned} \sum_n a_n x^n \pm \sum_n b_n x^n &= \sum_n (a_n \pm b_n) x^n \\ \left( \sum_n a_n x^n \right) \left( \sum_n b_n x^n \right) &= \sum_n \left( \sum_{j=0}^n a_jb_{n-j} \right) x^n \\ \left( \sum_n a_n x^n \right)' &= \sum_n na_n x^{n-1} \end{aligned}

and in what follows I’ll start to use {\sum_n a_nx^n} more. But officially, all definitions for formal series are in terms of the coefficients alone; these presence of {x} serves as motivation only.

Exercise 6

Show that if {F = \sum_n a_nx^n} is a formal series, then it has a multiplicative inverse if and only if {a_0 \neq 0}.

On the other hand, with functional series, the above operations are even simpler:

Definition 7

Let {G_1 : U \rightarrow \mathbb C} and {G_2 : U \rightarrow \mathbb C} be functional series with the same domain {U}. Then {G_1 \pm G_2} and {G_1 \cdot G_2} are defined pointwise.

If {G : U \rightarrow \mathbb C} is a functional series (hence holomorphic), then {G'} is defined poinwise.

If {G} is nonvanishing on {U}, then {1/G : U \rightarrow \mathbb C} is defined pointwise (and otherwise is not defined).

Now, for these finite operations, everything works as you expect:

Theorem 8 (Compatibility of finite operations)

Suppose {F}, {F_1}, {F_2} are formal series, and {G}, {G_1}, {G_2} are functional series {U \rightarrow \mathbb C}. Assume {F \simeq G}, {F_1 \simeq G_1}, {F_2 \simeq G_2}.

  • {F_1 \pm F_2 \simeq G_1 \pm G_2}, {F_1 \cdot F_2 = G_1 \cdot G_2}.
  • {F' \simeq G'}.
  • If {1/G} is defined, then {1/F} is defined and {1/F \simeq 1/G}.

So far so good: as long as we’re doing finite operations. But once we step beyond that, things begin to go haywire.

4. Limits

We need to start considering limits of {(F_k)_k} and {(G_k)_k}, since we are trying to make progress towards infinite sums and products. Once we do this, things start to burn.

Definition 9

Let {F_1 = \sum_n a_n x^n} and {F_2 = \sum_n b_n x^n} be formal series, and define the difference by

\displaystyle d(F_1, F_2) = \begin{cases} 2^{-n} & a_n \neq b_n, \; n \text{ minimal} \\ 0 & F_1 = F_2. \end{cases}

This function makes {\mathbb C[[x]]} into a metric space, so we can discuss limits in this space. Actually, it is a normed vector space obtained by {\left\lVert F \right\rVert = d(F,0)} above.

Thus, {\lim_{k \rightarrow \infty} F_k = F} if each coefficient of {x^n} eventually stabilizes as {k \rightarrow \infty}. For example, as formal series we have that {(1,-1,0,0,\dots)}, {(1,0,-1,0,\dots)}, {(1,0,0,-1,\dots)} converges to {1 = (1,0,0,0\dots)}, which we write as

\displaystyle \lim_{k \rightarrow \infty} (1 - x^k) = 1 \qquad \text{as formal series}.

As for functional series, since they are functions on the same open set {U}, we can use pointwise convergence or the stronger uniform convergence; we’ll say explicitly which one we’re doing.

Example 10 (Limits don’t work at all)

In what follows, {F_k \simeq G_k} for every {k}.

  • Here is an example showing that if {\lim_k F_k = F}, the functions {G_k} may not converge even pointwise. Indeed, just take {F_k = 1 - x^k} as before, and let {U = \{ z : |z| < 2 \}}.
  • Here is an example showing that even if {G_k \rightarrow G} uniformly, {\lim_k F_k} may not exist. Take {G_k = 1 - 1/k} as constant functions. Then {G_k \rightarrow 1}, but {\lim_k F_k} doesn’t exist because the constant term never stabilizes (in the combinatorial sense).
  • The following example from this math.SE answer by Robert Israel shows that it’s possible that {F = \lim_k F_k} exists, and {G_k \rightarrow G} pointwise, and still {F \not\simeq G}. Let {U} be the open unit disk, and set

    \displaystyle \begin{aligned} A_k &= \{z = r e^{i\theta} \mid 2/k \le r \le 1, \; 0 \le \theta \le 2\pi - 1/k\} \\ B_k &= \left\{ |z| \le 1/k \right\} \end{aligned}

    for {k \ge 1}. By Runge theorem there’s a polynomial {p_k(z)} such that

    \displaystyle |p_k(z) - 1/z^{k}| < 1/k \text{ on } A_k \qquad \text{and} \qquad |p_k(z)| < 1/k \text{ on }B_k.


    \displaystyle G_k(z) = z^{k+1} p_k(z)

    is the desired counterexample (with {F_k} being the sequence of coefficients from {G}). Indeed by construction {\lim_k F_k = 0}, since {\left\lVert F_k \right\rVert \le 2^{-k}} for each {k}. Alas, {|g_k(z) - z| \le 2/k} for {z \in A_k \cup B_k}, so {G_k \rightarrow z} converges pointwise to the identity function.

To be fair, we do have the following saving grace:

Theorem 11 (Uniform convergence and both limits exist is sufficient)

Suppose that {G_k \rightarrow G} converges uniformly. Then if {F_k \simeq G_k} for every {k}, and {\lim_k F_k = F}, then {F \simeq G}.

Proof: Here is a proof, copied from this math.SE answer by Joey Zhou. WLOG {G = 0}, and let {g_n(z) = \sum{a^{(n)}_kz^k}}. It suffices to show that {a_k = 0} for all {k}. Choose any {0<r<1}. By Cauchy’s integral formula, we have

\displaystyle \begin{aligned} \left|a_k - a^{(n)}_k\right| &= \left|\frac{1}{2\pi i} \int\limits_{|z|=r}{\frac{g(z)-g_n(z)}{z^{n+1}}\text{ d}z}\right| \\ & \le\frac{1}{2\pi}(2\pi r)\frac{1}{r^{n+1}}\max\limits_{|z|=r}{|g(z)-g_n(z)|} \xrightarrow{n\rightarrow\infty} 0 \end{aligned}

since {g_n} converges uniformly to {g} on {U}. Hence, {a_k = \lim\limits_{n\rightarrow\infty}{a^{(n)}_k}}. Since {a^{(n)}_k = 0} for {n\ge k}, the result follows. \Box

The take-away from this section is that limits are relatively poorly behaved.

5. Infinite sums and products

Naturally, infinite sums and products are defined by taking the limit of partial sums and limits. The following example (from math.SE again) shows the nuances of this behavior.

Example 12 (On {e^{1+x}})

The expression

\displaystyle \sum_{n=0}^\infty \frac{(1+x)^n}{n!} = \lim_{N \rightarrow \infty} \sum_{n=0}^N \frac{(1+x)^n}{n!}

does not make sense as a formal series: we observe that for every {N} the constant term of the partial sum changes.

But this does converge (uniformly, even) to a functional series on {U = \mathbb C}, namely to {e^{1+x}}.

Exercise 13

Let {(F_k)_{k \ge 1}} be formal series.

  • Show that an infinite sum {\sum_{k=1}^\infty F_k(x)} converges as formal series exactly when {\lim_k \left\lVert F_k \right\rVert = 0}.
  • Assume for convenience {F_k(0) = 1} for each {k}. Show that an infinite product {\prod_{k=0}^{\infty} (1+F_k)} converges as formal series exactly when {\lim_k \left\lVert F_k-1 \right\rVert = 0}.

Now the upshot is that one example of a convergent formal sum is the expression {\lim_{N} \sum_{n=0}^N a_nx^n} itself! This means we can use standard “radius of convergence” arguments to transfer a formal series into functional one.

Theorem 14 (Constructing {G} from {F})

Let {F = \sum a_nx^n} be a formal series and let

\displaystyle r = \frac{1}{\limsup_n \sqrt[n]{|c_n|}}.

If {r > 0} then there exists a functional series {G} on {U = \{ |z| < r \}} such that {F \simeq G}.

Proof: Let {F_k} and {G_k} be the corresponding partial sums of {c_0x^0} to {c_kx^k}. Then by Cauchy-Hadamard theorem, we have {G_k \rightarrow G} uniformly on (compact subsets of) {U}. Also, {\lim_k F_k = F} by construction. \Box

This works less well with products: for example we have

\displaystyle 1 \equiv (1-x) \prod_{j \ge 0} (1+x^{2^j})

as formal series, but we can’t “plug in {x=1}”, for example,

6. Finishing the original problem

We finally return to the original problem: we wish to show that the equality

\displaystyle P(x) = \prod_{j=1}^\infty (1 - x^{s_j})

cannot hold as formal series. We know that tacitly, this just means

\displaystyle \lim_{N \rightarrow \infty} \prod_{j=1}^N\left( 1 - x^{s_j} \right) = P(x)

as formal series.

Here is a solution obtained only by only considering coefficients, presented by Qiaochu Yuan from this MathOverflow question.

Both sides have constant coefficient {1}, so we may invert them; thus it suffices to show we cannot have

\displaystyle \frac{1}{P(x)} = \frac{1}{\prod_{j=1}^{\infty} (1 - x^{s_j})}

as formal power series.

The coefficients on the LHS have asymptotic growth a polynomial times an exponential.

On the other hand, the coefficients of the RHS can be shown to have growth both strictly larger than any polynomial (by truncating the product) and strictly smaller than any exponential (by comparing to the growth rate in the case where {s_j = j}, which gives the partition function {p(n)} mentioned before). So the two rates of growth can’t match.

Things Fourier

For some reason several classes at MIT this year involve Fourier analysis. I was always confused about this as a high schooler, because no one ever gave me the “orthonormal basis” explanation, so here goes. As a bonus, I also prove a form of Arrow’s Impossibility Theorem using binary Fourier analysis, and then talk about the fancier generalizations using Pontryagin duality and the Peter-Weyl theorem.

In what follows, we let {\mathbb T = \mathbb R/\mathbb Z} denote the “circle group”, thought of as the additive group of “real numbers modulo {1}”. There is a canonical map {e : \mathbb T \rightarrow \mathbb C} sending {\mathbb T} to the complex unit circle, given by {e(\theta) = \exp(2\pi i \theta)}.

Disclaimer: I will deliberately be sloppy with convergence issues, in part because I don’t fully understand them myself, and in part because I don’t care.

1. Synopsis

Suppose we have a domain {Z} and are interested in functions {f : Z \rightarrow \mathbb C}. Naturally, the set of such functions form a complex vector space. We like to equip the set of such functions with an positive definite inner product. The idea of Fourier analysis is to then select an orthonormal basis for this set of functions, say {(e_\xi)_{\xi}}, which we call the characters; the indexing {\xi} are called frequencies. In that case, since we have a basis, every function {f : Z \rightarrow \mathbb C} becomes a sum

\displaystyle  f(x) = \sum_{\xi} \widehat f(\xi) e_\xi

where {\widehat f(\xi)} are complex coefficients of the basis; appropriately we call {\widehat f} the Fourier coefficients. The variable {x \in Z} is referred to as the physical variable. This is generally good because the characters are deliberately chosen to be nice “symmetric” functions, like sine or cosine waves or other periodic functions. Thus {we} decompose an arbitrarily complicated function into a sum on nice ones.

For convenience, we record a few facts about orthonormal bases.

Proposition 1 (Facts about orthonormal bases)

Let {V} be a complex Hilbert space with inner form {\left< -,-\right>} and suppose {x = \sum_\xi a_\xi e_\xi} and {y = \sum_\xi b_\xi e_\xi} where {e_\xi} are an orthonormal basis. Then

\displaystyle  \begin{aligned} \left< x,x \right> &= \sum_\xi |a_\xi|^2 \\ a_\xi &= \left< x, e_\xi \right> \\ \left< x,y \right> &= \sum_\xi a_\xi \overline{b_\xi}. \end{aligned}

2. Common Examples

2.1. Binary Fourier analysis on {\{\pm1\}^n}

Let {Z = \{\pm 1\}^n} for some positive integer {n}, so we are considering functions {f(x_1, \dots, x_n)} accepting binary values. Then the functions {Z \rightarrow \mathbb C} form a {2^n}-dimensional vector space {\mathbb C^Z}, and we endow it with the inner form

\displaystyle  \left< f,g \right> = \frac{1}{2^n} \sum_{x \in Z} f(x) \overline{g(x)}.

In particular,

\displaystyle  \left< f,f \right> = \frac{1}{2^n} \sum_{x \in Z} \left\lvert f(x) \right\rvert^2

is the average of the squares; this establishes also that {\left< -,-\right>} is positive definite.

In that case, the multilinear polynomials form a basis of {\mathbb C^Z}, that is the polynomials

\displaystyle  \chi_S(x_1, \dots, x_n) = \prod_{s \in S} x_s.

Thus our frequency set is actually the subsets {S \subseteq \{1, \dots, n\}}. Thus, we have a decomposition

\displaystyle  f = \sum_{S \subseteq \{1, \dots, n\}} \widehat f(S) \chi_S.

Example 2 (An example of binary Fourier analysis)

Let {n = 2}. Then binary functions {\{ \pm 1\}^2 \rightarrow \mathbb C} have a basis given by the four polynomials

\displaystyle  1, \quad x_1, \quad x_2, \quad x_1x_2.

For example, consider the function {f} which is {1} at {(1,1)} and {0} elsewhere. Then we can put

\displaystyle  f(x_1, x_2) = \frac{x_1+1}{2} \cdot \frac{x_2+1}{2} = \frac14 \left( 1 + x_1 + x_2 + x_1x_2 \right).

So the Fourier coefficients are {\widehat f(S) = \frac 14} for each of the four {S}‘s.

This notion is useful in particular for binary functions {f : \{\pm1\}^n \rightarrow \{\pm1\}}; for these functions (and products thereof), we always have {\left< f,f \right> = 1}.

It is worth noting that the frequency {\varnothing} plays a special role:

Exercise 3

Show that

\displaystyle  \widehat f(\varnothing) = \frac{1}{|Z|} \sum_{x \in Z} f(x).

2.2. Fourier analysis on finite groups {Z}

This is the Fourier analysis used in this post and this post. Here, we have a finite abelian group {Z}, and consider functions {Z \rightarrow \mathbb C}; this is a {|Z|}-dimensional vector space. The inner product is the same as before:

\displaystyle  \left< f,g \right> = \frac{1}{|Z|} \sum_{x \in Z} f(x) \overline{g}(x).

Now here is how we generate the characters. We equip {Z} with a non-degenerate symmetric bilinear form

\displaystyle  Z \times Z \xrightarrow{\cdot} \mathbb T \qquad (\xi, x) \mapsto \xi \cdot x.

Experts may already recognize this as a choice of isomorphism between {Z} and its Pontryagin dual. This time the characters are given by

\displaystyle  \left( e_\xi \right)_{\xi \in Z} \qquad \text{where} \qquad e_\xi(x) = e(\xi \cdot x).

In this way, the set of frequencies is also {Z}, but the {\xi \in Z} play very different roles from the “physical” {x \in Z}. (It is not too hard to check these indeed form an orthonormal basis in the function space {\mathbb C^{\left\lvert Z \right\rvert}}, since we assumed that {\cdot} is non-degenerate.)

Example 4 (Cube roots of unity filter)

Suppose {Z = \mathbb Z/3\mathbb Z}, with the inner form given by {\xi \cdot x = (\xi x)/3}. Let {\omega = \exp(\frac 23 \pi i)} be a primitive cube root of unity. Note that

\displaystyle  e_\xi(x) = \begin{cases} 1 & \xi = 0 \\ \omega^x & \xi = 1 \\ \omega^{2x} & \xi = 2. \end{cases}

Then given {f : Z \rightarrow \mathbb C} with {f(0) = a}, {f(1) = b}, {f(2) = c}, we obtain

\displaystyle  f(x) = \frac{a+b+c}{3} \cdot 1 + \frac{a + \omega^2 b + \omega c}{3} \cdot \omega^x + \frac{a + \omega b + \omega^2 c}{3} \cdot \omega^{2x}.

In this way we derive that the transforms are

\displaystyle  \begin{aligned} \widehat f(0) &= \frac{a+b+c}{3} \\ \widehat f(1) &= \frac{a+\omega^2 b+ \omega c}{3} \\ \widehat f(2) &= \frac{a+\omega b+\omega^2c}{3}. \end{aligned}

Exercise 5

Show that

\displaystyle  \widehat f(0) = \frac{1}{|Z|} \sum_{x \in Z} f(x).

Olympiad contestants may recognize the previous example as a “roots of unity filter”, which is exactly the point. For concreteness, suppose one wants to compute

\displaystyle  \binom{1000}{0} + \binom{1000}{3} + \dots + \binom{1000}{999}.

In that case, we can consider the function

\displaystyle  w : \mathbb Z/3 \rightarrow \mathbb C.

such that {w(0) = 1} but {w(1) = w(2) = 0}. By abuse of notation we will also think of {w} as a function {w : \mathbb Z \twoheadrightarrow \mathbb Z/3 \rightarrow \mathbb C}. Then the sum in question is

\displaystyle  \begin{aligned} \sum_n \binom{1000}{n} w(n) &= \sum_n \binom{1000}{n} \sum_{k=0,1,2} \widehat w(k) \omega^{kn} \\ &= \sum_{k=0,1,2} \widehat w(k) \sum_n \binom{1000}{n} \omega^{kn} \\ &= \sum_{k=0,1,2} \widehat w(k) (1+\omega^k)^n. \end{aligned}

In our situation, we have {\widehat w(0) = \widehat w(1) = \widehat w(2) = \frac13}, and we have evaluated the desired sum. More generally, we can take any periodic weight {w} and use Fourier analysis in order to interchange the order of summation.

Example 6 (Binary Fourier analysis)

Suppose {Z = \{\pm 1\}^n}, viewed as an abelian group under pointwise multiplication hence isomorphic to {(\mathbb Z/2\mathbb Z)^{\oplus n}}. Assume we pick the dot product defined by

\displaystyle  \xi \cdot x = \frac{1}{2} \sum_i \xi_i x_i

where {\xi = (\xi_1, \dots, \xi_n)} and {x = (x_1, \dots, x_n)}.

We claim this coincides with the first example we gave. Indeed, let {S \subseteq \{1, \dots, n\}} and let {\xi \in \{\pm1\}^n} which is {-1} at positions in {S}, and {+1} at positions not in {S}. Then the character {\chi_S} form the previous example coincides with the character {e_\xi} in the new notation. In particular, {\widehat f(S) = \widehat f(\xi)}.

Thus Fourier analysis on a finite group {Z} subsumes binary Fourier analysis.

2.3. Fourier series for functions {L^2([-\pi, \pi])}

Now we consider the space {L^2([-\pi, \pi])} of square-integrable functions {[-\pi, \pi] \rightarrow \mathbb C}, with inner form

\displaystyle  \left< f,g \right> = \frac{1}{2\pi} \int_{[-\pi, \pi]} f(x) \overline{g(x)}.

Sadly, this is not a finite-dimensional vector space, but fortunately it is a Hilbert space so we are still fine. In this case, an orthonormal basis must allow infinite linear combinations, as long as the sum of squares is finite.

Now, it turns out in this case that

\displaystyle  (e_n)_{n \in \mathbb Z} \qquad\text{where}\qquad e_n(x) = \exp(inx)

is an orthonormal basis for {L^2([-\pi, \pi])}. Thus this time the frequency set {\mathbb Z} is infinite. So every function {f \in L^2([-\pi, \pi])} decomposes as

\displaystyle  f(x) = \sum_n \widehat f(n) \exp(inx)

for {\widehat f(n)}.

This is a little worse than our finite examples: instead of a finite sum on the right-hand side, we actually have an infinite sum. This is because our set of frequencies is now {\mathbb Z}, which isn’t finite. In this case the {\widehat f} need not be finitely supported, but do satisfy {\sum_n |\widehat f(n)|^2 < \infty}.

Since the frequency set is indexed by {\mathbb Z}, we call this a Fourier series to reflect the fact that the index is {n \in \mathbb Z}.

Exercise 7

Show once again

\displaystyle  \widehat f(0) = \frac{1}{2\pi} \int_{[-\pi, \pi]} f(x).

Often we require that the function {f} satisfies {f(-\pi) = f(\pi)}, so that {f} becomes a periodic function, and we can think of it as {f : \mathbb T \rightarrow \mathbb C}.

2.4. Summary

We summarize our various flavors of Fourier analysis in the following table.

\displaystyle  \begin{array}{llll} \hline \text{Type} & \text{Physical var} & \text{Frequency var} & \text{Basis functions} \\ \hline \textbf{Binary} & \{\pm1\}^n & \text{Subsets } S \subseteq \left\{ 1, \dots, n \right\} & \prod_{s \in S} x_s \\ \textbf{Finite group} & Z & \xi \in Z, \text{ choice of } \cdot, & e(\xi \cdot x) \\ \textbf{Fourier series} & \mathbb T \text{ or } [-\pi, \pi] & n \in \mathbb Z & \exp(inx) \\ \end{array}

In fact, we will soon see that all these examples are subsumed by Pontryagin duality for compact groups {G}.

3. Parseval and friends

The notion of an orthonormal basis makes several “big-name” results in Fourier analysis quite lucid. Basically, we can take every result from Proposition~1, translate it into the context of our Fourier analysis, and get a big-name result.

Corollary 8 (Parseval theorem)

Let {f : Z \rightarrow \mathbb C}, where {Z} is a finite abelian group. Then

\displaystyle  \sum_\xi |\widehat f(\xi)|^2 = \frac{1}{|Z|} \sum_{x \in Z} |f(x)|^2.

Similarly, if {f : [-\pi, \pi] \rightarrow \mathbb C} is square-integrable then its Fourier series satisfies

\displaystyle  \sum_n |\widehat f(n)|^2 = \frac{1}{2\pi} \int_{[-\pi, \pi]} |f(x)|^2.

Proof: Recall that {\left< f,f\right>} is equal to the square sum of the coefficients. \Box

Corollary 9 (Formulas for {\widehat f})

Let {f : Z \rightarrow \mathbb C}, where {Z} is a finite abelian group. Then

\displaystyle  \widehat f(\xi) = \frac{1}{|Z|} \sum_{x \in Z} f(x) \overline{e_\xi(x)}.

Similarly, if {f : [-\pi, \pi] \rightarrow \mathbb C} is square-integrable then its Fourier series is given by

\displaystyle  \widehat f(n) = \frac{1}{2\pi} \int_{[-\pi, \pi]} f(x) \exp(-inx).

Proof: Recall that in an orthonormal basis {(e_\xi)_\xi}, the coefficient of {e_\xi} in {f} is {\left< f, e_\xi\right>}. \Box
Note in particular what happens if we select {\xi = 0} in the above!

Corollary 10 (Plancherel theorem)

Let {f : Z \rightarrow \mathbb C}, where {Z} is a finite abelian group. Then

\displaystyle  \left< f,g \right> = \sum_{\xi \in Z} \widehat f(\xi) \overline{\widehat g(\xi)}.

Similarly, if {f : [-\pi, \pi] \rightarrow \mathbb C} is square-integrable then

\displaystyle  \left< f,g \right> = \sum_n \widehat f(\xi) \overline{\widehat g(\xi)}.

Proof: Guess! \Box

4. (Optional) Arrow’s Impossibility Theorem

As an application, we now prove a form of Arrow’s theorem. Consider {n} voters voting among {3} candidates {A}, {B}, {C}. Each voter specifies a tuple {v_i = (x_i, y_i, z_i) \in \{\pm1\}^3} as follows:

  • {x_i = 1} if {A} ranks {A} ahead of {B}, and {x_i = -1} otherwise.
  • {y_i = 1} if {A} ranks {B} ahead of {C}, and {y_i = -1} otherwise.
  • {z_i = 1} if {A} ranks {C} ahead of {A}, and {z_i = -1} otherwise.

Tacitly, we only consider {3! = 6} possibilities for {v_i}: we forbid “paradoxical” votes of the form {x_i = y_i = z_i} by assuming that people’s votes are consistent (meaning the preferences are transitive).

Then, we can consider a voting mechanism

\displaystyle  \begin{aligned} f : \{\pm1\}^n &\rightarrow \{\pm1\} \\ g : \{\pm1\}^n &\rightarrow \{\pm1\} \\ h : \{\pm1\}^n &\rightarrow \{\pm1\} \end{aligned}

such that {f(x_\bullet)} is the global preference of {A} vs. {B}, {g(y_\bullet)} is the global preference of {B} vs. {C}, and {h(z_\bullet)} is the global preference of {C} vs. {A}. We’d like to avoid situations where the global preference {(f(x_\bullet), g(y_\bullet), h(z_\bullet))} is itself paradoxical.

In fact, we will prove the following theorem:

Theorem 11 (Arrow Impossibility Theorem)

Assume that {(f,g,h)} always avoids paradoxical outcomes, and assume {\mathbf E f = \mathbf E g = \mathbf E h = 0}. Then {(f,g,h)} is either a dictatorship or anti-dictatorship: there exists a “dictator” {k} such that

\displaystyle  f(x_\bullet) = \pm x_k, \qquad g(y_\bullet) = \pm y_k, \qquad h(z_\bullet) = \pm z_k

where all three signs coincide.

The “irrelevance of independent alternatives” reflects that The assumption {\mathbf E f = \mathbf E g = \mathbf E h = 0} provides symmetry (and e.g. excludes the possibility that {f}, {g}, {h} are constant functions which ignore voter input). Unlike the usual Arrow theorem, we do not assume that {f(+1, \dots, +1) = +1} (hence possibility of anti-dictatorship).

To this end, we actually prove the following result:

Lemma 12

Assume the {n} voters vote independently at random among the {3! = 6} possibilities. The probability of a paradoxical outcome is exactly

\displaystyle  \frac14 + \frac14 \sum_{S \subseteq \{1, \dots, n\}} \left( -\frac13 \right)^{\left\lvert S \right\rvert} \left( \widehat f(S) \widehat g(S) + \widehat g(S) \widehat h(S) + \widehat h(S) \widehat f(S) \right) .

Proof: Define the Boolean function {D : \{\pm 1\}^3 \rightarrow \mathbb R} by

\displaystyle  D(a,b,c) = ab + bc + ca = \begin{cases} 3 & a,b,c \text{ all equal} \\ -1 & a,b,c \text{ not all equal}. \end{cases}.

Thus paradoxical outcomes arise when {D(f(x_\bullet), g(y_\bullet), h(z_\bullet)) = 3}. Now, we compute that for randomly selected {x_\bullet}, {y_\bullet}, {z_\bullet} that

\displaystyle  \begin{aligned} \mathbf E D(f(x_\bullet), g(y_\bullet), h(z_\bullet)) &= \mathbf E \sum_S \sum_T \left( \widehat f(S) \widehat g(T) + \widehat g(S) \widehat h(T) + \widehat h(S) \widehat f(T) \right) \left( \chi_S(x_\bullet)\chi_T(y_\bullet) \right) \\ &= \sum_S \sum_T \left( \widehat f(S) \widehat g(T) + \widehat g(S) \widehat h(T) + \widehat h(S) \widehat f(T) \right) \mathbf E\left( \chi_S(x_\bullet)\chi_T(y_\bullet) \right). \end{aligned}

Now we observe that:

  • If {S \neq T}, then {\mathbf E \chi_S(x_\bullet) \chi_T(y_\bullet) = 0}, since if say {s \in S}, {s \notin T} then {x_s} affects the parity of the product with 50% either way, and is independent of any other variables in the product.
  • On the other hand, suppose {S = T}. Then

    \displaystyle  \chi_S(x_\bullet) \chi_T(y_\bullet) = \prod_{s \in S} x_sy_s.

    Note that {x_sy_s} is equal to {1} with probability {\frac13} and {-1} with probability {\frac23} (since {(x_s, y_s, z_s)} is uniform from {3!=6} choices, which we can enumerate). From this an inductive calculation on {|S|} gives that

    \displaystyle  \prod_{s \in S} x_sy_s = \begin{cases} +1 & \text{ with probability } \frac{1}{2}(1+(-1/3)^{|S|}) \\ -1 & \text{ with probability } \frac{1}{2}(1-(-1/3)^{|S|}). \end{cases}


    \displaystyle  \mathbf E \left( \prod_{s \in S} x_sy_s \right) = \left( -\frac13 \right)^{|S|}.

Piecing this altogether, we now have that

\displaystyle  \mathbf E D(f(x_\bullet), g(y_\bullet), h(z_\bullet)) = \left( \widehat f(S) \widehat g(T) + \widehat g(S) \widehat h(T) + \widehat h(S) \widehat f(T) \right) \left( -\frac13 \right)^{|S|}.

Then, we obtain that

\displaystyle  \begin{aligned} &\mathbf E \frac14 \left( 1 + D(f(x_\bullet), g(y_\bullet), h(z_\bullet)) \right) \\ =& \frac14 + \frac14\sum_S \left( \widehat f(S) \widehat g(T) + \widehat g(S) \widehat h(T) + \widehat h(S) \widehat f(T) \right) \widehat f(S)^2 \left( -\frac13 \right)^{|S|}. \end{aligned}

Comparing this with the definition of {D} gives the desired result. \Box

Now for the proof of the main theorem. We see that

\displaystyle  1 = \sum_{S \subseteq \{1, \dots, n\}} -\left( -\frac13 \right)^{\left\lvert S \right\rvert} \left( \widehat f(S) \widehat g(S) + \widehat g(S) \widehat h(S) + \widehat h(S) \widehat f(S) \right).

But now we can just use weak inequalities. We have {\widehat f(\varnothing) = \mathbf E f = 0} and similarly for {\widehat g} and {\widehat h}, so we restrict attention to {|S| \ge 1}. We then combine the famous inequality {|ab+bc+ca| \le a^2+b^2+c^2} (which is true across all real numbers) to deduce that

\displaystyle  \begin{aligned} 1 &= \sum_{S \subseteq \{1, \dots, n\}} -\left( -\frac13 \right)^{\left\lvert S \right\rvert} \left( \widehat f(S) \widehat g(S) + \widehat g(S) \widehat h(S) + \widehat h(S) \widehat f(S) \right) \\ &\le \sum_{S \subseteq \{1, \dots, n\}} \left( \frac13 \right)^{\left\lvert S \right\rvert} \left( \widehat f(S)^2 + \widehat g(S)^2 + \widehat h(S)^2 \right) \\ &\le \sum_{S \subseteq \{1, \dots, n\}} \left( \frac13 \right)^1 \left( \widehat f(S)^2 + \widehat g(S)^2 + \widehat h(S)^2 \right) \\ &= \frac13 (1+1+1) = 1. \end{aligned}

with the last step by Parseval. So all inequalities must be sharp, and in particular {\widehat f}, {\widehat g}, {\widehat h} are supported on one-element sets, i.e. they are linear in inputs. As {f}, {g}, {h} are {\pm 1} valued, each {f}, {g}, {h} is itself either a dictator or anti-dictator function. Since {(f,g,h)} is always consistent, this implies the final result.

5. Pontryagin duality

In fact all the examples we have covered can be subsumed as special cases of Pontryagin duality, where we replace the domain with a general group {G}. In what follows, we assume {G} is a locally compact abelian (LCA) group, which just means that:

  • {G} is a abelian topological group,
  • the topology on {G} is Hausdorff, and
  • the topology on {G} is locally compact: every point of {G} has a compact neighborhood.

Notice that our previous examples fall into this category:

Example 13 (Examples of locally compact abelian groups)

  • Any finite group {Z} with the discrete topology is LCA.
  • The circle group {\mathbb T} is LCA and also in fact compact.
  • The real numbers {\mathbb R} are an example of an LCA group which is not compact.

5.1. The Pontryagin dual

The key definition is:

Definition 14

Let {G} be an LCA group. Then its Pontryagin dual is the abelian group

\displaystyle  \widehat G \overset{\mathrm{def}}{=} \left\{ \text{continuous group homomorphisms } \xi : G \rightarrow \mathbb T \right\}.

The maps {\xi} are called characters. By equipping it with the compact-open topology, we make {\widehat G} into an LCA group as well.

Example 15 (Examples of Pontryagin duals)

  • {\widehat{\mathbb Z} \cong \mathbb T}.
  • {\widehat{\mathbb T} \cong \mathbb Z}. The characters are given by {\theta \mapsto n\theta} for {n \in \mathbb Z}.
  • {\widehat{\mathbb R} \cong \mathbb R}. This is because a nonzero continuous homomorphism {\mathbb R \rightarrow S^1} is determined by the fiber above {1 \in S^1}. (Covering projections, anyone?)
  • {\widehat{\mathbb Z/n\mathbb Z} \cong \mathbb Z/n\mathbb Z}, characters {\xi} being determined by the image {\xi(1) \in \mathbb T}.
  • {\widehat{G \times H} \cong \widehat G \times \widehat H}.
  • If {Z} is a finite abelian group, then previous two examples (and structure theorem for abelian groups) imply that {\widehat{Z} \cong Z}, though not canonically. You may now recognize that the bilinear form {\cdot : Z \times Z \rightarrow Z} is exactly a choice of isomorphism {Z \rightarrow \widehat Z}.
  • For any group {G}, the dual of {\widehat G} is canonically isomorphic to {G}, id est there is a natural isomorphism

    \displaystyle  G \cong \widehat{\widehat G} \qquad \text{by} \qquad x \mapsto \left( \xi \mapsto \xi(x) \right).

    This is the Pontryagin duality theorem. (It is an analogy to the isomorphism {(V^\vee)^\vee \cong V} for vector spaces {V}.)

5.2. The orthonormal basis in the compact case

Now assume {G} is LCA but also compact, and thus has a unique Haar measure {\mu} such that {\mu(G) = 1}; this lets us integrate over {G}. Let {L^2(G)} be the space of square-integrable functions to {\mathbb C}, i.e.

\displaystyle  L^2(G) = \left\{ f : G \rightarrow \mathbb C \quad\text{such that}\quad \int_G |f|^2 \; d\mu < \infty \right\}.

Thus we can equip it with the inner form

\displaystyle  \left< f,g \right> = \int_G f\overline{g} \; d\mu.

In that case, we get all the results we wanted before:

Theorem 16 (Characters of {\widehat G} forms an orthonormal basis)

Assume {G} is LCA and compact. Then {\widehat G} is discrete, and the characters

\displaystyle  (e_\xi)_{\xi \in \widehat G} \qquad\text{by}\qquad e_\xi(x) = e(\xi(x)) = \exp(2\pi i \xi(x))

form an orthonormal basis of {L^2(G)}. Thus for each {f \in L^2(G)} we have

\displaystyle  f = \sum_{\xi \in \widehat G} \widehat f(\xi) e_\xi


\displaystyle  \widehat f(\xi) = \left< f, e_\xi \right> = \int_G f(x) \exp(-2\pi i \xi(x)) \; d\mu.

The sum {\sum_{\xi \in \widehat G}} makes sense since {\widehat G} is discrete. In particular,

  • Letting {G = Z} gives “Fourier transform on finite groups”.
  • The special case {G = \mathbb Z/n\mathbb Z} has its own Wikipedia page.
  • Letting {G = \mathbb T} gives the “Fourier series” earlier.

5.3. The Fourier transform of the non-compact case

If {G} is LCA but not compact, then Theorem~16 becomes false. On the other hand, it is still possible to define a transform, but one needs to be a little more careful. The generic example to keep in mind in what follows is {G = \mathbb R}.

In what follows, we fix a Haar measure {\mu} for {G}. (This {\mu} is no longer unique up to scaling, since {\mu(G) = \infty}.)

One considers this time the space {L^1(G)} of absolutely integrable functions. Then one directly defines the Fourier transform of {f \in L^1(G)} to be

\displaystyle  \widehat f(\xi) = \int_G f \overline{e_\xi} \; d\mu

imitating the previous definitions in the absence of an inner product. This {\widehat f} may not be {L^1}, but it is at least bounded. Then we manage to at least salvage:

Theorem 17 (Fourier inversion on {L^1(G)})

Take an LCA group {G} and fix a Haar measure {\mu} on it. One can select a unique dual measure {\widehat \mu} on {\widehat G} such that if {f \in L^1(G)}, {\widehat f \in L^1(\widehat G)}, the “Fourier inversion formula”

\displaystyle  f(x) = \int_{\widehat G} \widehat f(\xi) e_\xi(x) \; d\widehat\mu.

holds almost everywhere. It holds everywhere if {f} is continuous.

Notice the extra nuance of having to select measures, because it is no longer the case that {G} has a single distinguished measure.

Despite the fact that the {e_\xi} no longer form an orthonormal basis, the transformed function {\widehat f : \widehat G \rightarrow \mathbb C} is still often useful. In particular, they have special names for a few special {G}:

5.4. Summary

In summary,

  • Given any LCA group {G}, we can transform sufficiently nice functions on {G} into functions on {\widehat G}.
  • If {G} is compact, then we have the nicest situation possible: {L^2(G)} is an inner product space with {\left< f,g \right> = \int_G f \overline{g} \; d\mu}, and {e_\xi} form an orthonormal basis across {\widehat \xi \in \widehat G}.
  • If {G} is not compact, then we no longer get an orthonormal basis or even an inner product space, but it is still possible to define the transform

    \displaystyle  \widehat f : \widehat G \rightarrow \mathbb C

    for {f \in L^1(G)}. If {\widehat f} is also in {L^1(G)} we still get a “Fourier inversion formula” expressing {f} in terms of {\widehat f}.

We summarize our various flavors of Fourier analysis for various {G} in the following. In the first half {G} is compact, in the second half {G} is not.

\displaystyle  \begin{array}{llll} \hline \text{Name} & \text{Domain }G & \text{Dual }\widehat G & \text{Characters} \\ \hline \textbf{Binary Fourier analysis} & \{\pm1\}^n & S \subseteq \left\{ 1, \dots, n \right\} & \prod_{s \in S} x_s \\ \textbf{Fourier transform on finite groups} & Z & \xi \in \widehat Z \cong Z & e( i \xi \cdot x) \\ \textbf{Discrete Fourier transform} & \mathbb Z/n\mathbb Z & \xi \in \mathbb Z/n\mathbb Z & e(\xi x / n) \\ \textbf{Fourier series} & \mathbb T \cong [-\pi, \pi] & n \in \mathbb Z & \exp(inx) \\ \hline \textbf{Continuous Fourier transform} & \mathbb R & \xi \in \mathbb R & e(\xi x) \\ \textbf{Discrete time Fourier transform} & \mathbb Z & \xi \in \mathbb T \cong [-\pi, \pi] & \exp(i \xi n) \\ \end{array}

You might notice that the various names are awful. This is part of the reason I got confused as a high school student: every type of Fourier series above has its own Wikipedia article. If it were up to me, we would just use the term “{G}-Fourier transform”, and that would make everyone’s lives a lot easier.

6. Peter-Weyl

In fact, if {G} is a Lie group, even if {G} is not abelian we can still give an orthonormal basis of {L^2(G)} (the square-integrable functions on {G}). It turns out in this case the characters are attached to complex irreducible representations of {G} (and in what follows all representations are complex).

The result is given by the Peter-Weyl theorem. First, we need the following result:

Lemma 18 (Compact Lie groups have unitary reps)

Any finite-dimensional (complex) representation {V} of a compact Lie group {G} is unitary, meaning it can be equipped with a {G}-invariant inner form. Consequently, {V} is completely reducible: it splits into the direct sum of irreducible representations of {G}.

Proof: Suppose {B : V \times V \rightarrow \mathbb C} is any inner product. Equip {G} with a right-invariant Haar measure {dg}. Then we can equip it with an “averaged” inner form

\displaystyle  \widetilde B(v,w) = \int_G B(gv, gw) \; dg.

Then {\widetilde B} is the desired {G}-invariant inner form. Now, the fact that {V} is completely reducible follows from the fact that given a subrepresentation of {V}, its orthogonal complement is also a subrepresentation. \Box

The Peter-Weyl theorem then asserts that the finite-dimensional irreducible unitary representations essentially give an orthonormal basis for {L^2(G)}, in the following sense. Let {V = (V, \rho)} be such a representation of {G}, and fix an orthonormal basis of {e_1}, \dots, {e_d} for {V} (where {d = \dim V}). The {(i,j)}th matrix coefficient for {V} is then given by

\displaystyle  G \xrightarrow{\rho} \mathop{\mathrm{GL}}(V) \xrightarrow{\pi_{ij}} \mathbb C

where {\pi_{ij}} is the projection onto the {(i,j)}th entry of the matrix. We abbreviate {\pi_{ij} \circ \rho} to {\rho_{ij}}. Then the theorem is:

Theorem 19 (Peter-Weyl)

Let {G} be a compact Lie group. Let {\Sigma} denote the (pairwise non-isomorphic) irreducible finite-dimensional unitary representations of {G}. Then

\displaystyle  \left\{ \sqrt{\dim V} \rho_{ij} \; \Big\vert \; (V, \rho) \in \Sigma, \text{ and } 1 \le i,j \le \dim V \right\}

is an orthonormal basis of {L^2(G)}.

Strictly, I should say {\Sigma} is a set of representatives of the isomorphism classes of irreducible unitary representations, one for each isomorphism class.

In the special case {G} is abelian, all irreducible representations are one-dimensional. A one-dimensional representation of {G} is a map {G \hookrightarrow \mathop{\mathrm{GL}}(\mathbb C) \cong \mathbb C^\times}, but the unitary condition implies it is actually a map {G \hookrightarrow S^1 \cong \mathbb T}, i.e. it is an element of {\widehat G}.

Uniqueness of Solutions for DiffEq’s

Let {V} be a normed finite-dimensional real vector space and let {U \subseteq V} be an open set. A vector field on {U} is a function {\xi : U \rightarrow V}. (In the words of Gaitsgory: “you should imagine a vector field as a domain, and at every point there is a little vector growing out of it.”)

The idea of a differential equation is as follows. Imagine your vector field specifies a velocity at each point. So you initially place a particle somewhere in {U}, and then let it move freely, guided by the arrows in the vector field. (There are plenty of good pictures online.) Intuitively, for nice {\xi} it should be the case that the trajectory resulting is unique. This is the main take-away; the proof itself is just for completeness.

This is a so-called differential equation:

Definition 1

Let {\gamma : (-\varepsilon, \varepsilon) \rightarrow U} be a continuous path. We say {\gamma} is a solution to the differential equation defined by {\xi} if for each {t \in (-\varepsilon, \varepsilon)} we have

\displaystyle  \gamma'(t) = \xi(\gamma(t)).

Example 2 (Examples of DE’s)

Let {U = V = \mathbb R}.

  1. Consider the vector field {\xi(x) = 1}. Then the solutions {\gamma} are just {\gamma(t) = t+c}.
  2. Consider the vector field {\xi(x) = x}. Then {\gamma} is a solution exactly when {\gamma'(t) = \gamma(t)}. It’s well-known that {\gamma(t) = c\exp(t)}.

Of course, you may be used to seeing differential equations which are time-dependent: i.e. something like {\gamma'(t) = t}, for example. In fact, you can hack this to fit in the current model using the idea that time is itself just a dimension. Suppose we want to model {\gamma'(t) = F(\gamma(t), t)}. Then we instead consider

\displaystyle  \xi : V \times \mathbb R \rightarrow V \times \mathbb R \qquad\text{by}\qquad \xi(v, t) = (F(v,t), 1)

and solve the resulting differential equation over {V \times \mathbb R}. This does exactly what we want. Geometrically, this means making time into another dimension and imagining that our particle moves at a “constant speed through time”.

The task is then mainly about finding which conditions guarantee that our differential equation behaves nicely. The answer turns out to be:

Definition 3

The vector field {\xi : U \rightarrow V} satisfies the Lipschitz condition if

\displaystyle  \left\lVert \xi(x')-\xi(x'') \right\rVert \le \Lambda \left\lVert x'-x'' \right\rVert

holds identically for some fixed constant {\Lambda}.

Note that continuously differentiable implies Lipschitz.

Theorem 4 (Picard-Lindelöf)

Let {V} be a finite-dimensional real vector space, and let {\xi} be a vector field on a domain {U \subseteq V} which satisfies the Lipschitz condition.

Then for every {x_0 \in U} there exists {(-\varepsilon,\varepsilon)} and {\gamma : (-\varepsilon,\varepsilon) \rightarrow U} such that {\gamma'(t) = \xi(\gamma(t))} and {\gamma(0) = x_0}. Moreover, if {\gamma_1} and {\gamma_2} are two solutions and {\gamma_1(t) = \gamma_2(t)} for some {t}, then {\gamma_1 = \gamma_2}.

In fact, Peano’s existence theorem says that if we replace Lipschitz continuity with just continuity, then {\gamma} exists but need not be unique. For example:

Example 5 (Counterexample if {\xi} is not differentiable)

Let {U = V = \mathbb R} and consider {\xi(x) = x^{\frac23}}, with {x_0 = 0}. Then {\gamma(t) = 0} and {\gamma(t) = \left( t/3 \right)^3} are both solutions to the differential equation

\displaystyle  \gamma'(t) = \gamma(t)^{\frac 23}.

Now, for the proof of the main theorem. The main idea is the following result (sometimes called the contraction principle).

Lemma 6 (Banach Fixed-Point Theorem)

Let {(X,d)} be a complete metric space. Let {f : X \rightarrow X} be a map such that {d(f(x_1), f(x_2)) < \frac{1}{2} d(x_1, x_2)} for any {x_1, x_2 \in X}. Then {f} has a unique fixed point.

For the proof of the main theorem, we are given {x_0 \in V}. Let {X} be the metric space of continuous functions from {(-\varepsilon, \varepsilon)} to the complete metric space {\overline{B}(x_0, r)} which is the closed ball of radius {r} centered at {x_0}. (Here {r > 0} can be arbitrary, so long as it stays in {U}.) It turns out that {X} is itself a complete metric space when equipped with the sup norm

\displaystyle  d(f, g) = \sup_{t \in (-\varepsilon, \varepsilon)} \left\lVert f(t)-g(t) \right\rVert.

This is well-defined since {\overline{B}(x_0, r)} is compact.

We wish to use the Banach theorem on {X}, so we’ll rig a function {\Phi : X \rightarrow X} with the property that its fixed points are solutions to the differential equation. Define it by, for every {\gamma \in X},

\displaystyle  \Phi(\gamma) : t \mapsto x_0 + \int_0^t \xi(\gamma(s)) \; ds.

This function is contrived so that {(\Phi\gamma)(0) = x_0} and {\Phi\gamma} is both continuous and differentiable. By the Fundamental Theorem of Calculus, the derivative is exhibited by

\displaystyle  (\Phi\gamma)'(t) = \left( \int_0^t \xi(\gamma(s)) \; ds \right)' = \xi(\gamma(t)).

In particular, fixed points correspond exactly to solutions to our differential equation.

A priori this output has signature {\Phi\gamma : (-\varepsilon,\varepsilon) \rightarrow V}, so we need to check that {\Phi\gamma(t) \in \overline{B}(x_0, r)}. We can check that

\displaystyle  \begin{aligned} \left\lVert (\Phi\gamma)(t) - x_0 \right\rVert &=\left\lVert \int_0^t \xi(\gamma(s)) \; ds \right\rVert \\ &\le \int_0^t \left\lVert \xi(\gamma(s)) \; ds \right\rVert \\ &\le t \max_{s \in [0,t]} \left\lVert \xi\gamma(s) \right\rVert \\ &< \varepsilon \cdot A \end{aligned}

where {A = \max_{x \in \overline{B}(x_0,r)} \left\lVert \xi(x) \right\rVert}; we have {A < \infty} since {\overline{B}(x_0,r)} is compact. Hence by selecting {\varepsilon < r/A}, the above is bounded by {r}, so {\Phi\gamma} indeed maps into {\overline{B}(x_0, r)}. (Note that at this point we have not used the Lipschitz condition, only that {\xi} is continuous.)

It remains to show that {\Phi} is contracting. Write

\displaystyle  \begin{aligned} \left\lVert (\Phi\gamma_1)(t) - (\Phi\gamma_2)(t) \right\rVert &= \left\lVert \int_{s \in [0,t]} \left( \xi(\gamma_1(s))-\xi(\gamma_2(s)) \right) \right\rVert \\ &= \int_{s \in [0,t]} \left\lVert \xi(\gamma_1(s))-\xi(\gamma_2(s)) \right\rVert \\ &\le t\Lambda \sup_{s \in [0,t]} \left\lVert \gamma_1(s)-\gamma_2(s) \right\rVert \\ &< \varepsilon\Lambda \sup_{s \in [0,t]} \left\lVert \gamma_1(s)-\gamma_2(s) \right\rVert \\ &= \varepsilon\Lambda d(\gamma_1, \gamma_2) . \end{aligned}

Hence once again for {\varepsilon} sufficiently small we get {\varepsilon\Lambda \le \frac{1}{2}}. Since the above holds identically for {t}, this implies

\displaystyle  d(\Phi\gamma_1, \Phi\gamma_2) \le \frac{1}{2} d(\gamma_1, \gamma_2)

as needed.

This is a cleaned-up version of a portion of a lecture from Math 55b in Spring 2015, instructed by Dennis Gaitsgory.